GPT2 using MLX

Posted by Disastrous-Maybe2501@reddit | LocalLLaMA | View on Reddit | 3 comments

Hi all, I was learning LLM pre-training from Andrej Karpathy's NanoGPT and decided to try it out using MLX. I originally thought it would be more or less a simple translation from PyTorch to MLX, but it turned out to be much more tricky than that. I published my code and documented my learnings in a blog post included in the repo. I'll kick off full training on fineweb on my M3 Max and will be publishing the training results to the repo once I have that. Any thoughts and feedback are welcome, here or directly on the repo. Thanks!

[-]

Gregory-Wolf@reddit

Which M3 Max do you have? 128Gb MBP? How much time do you think pretraining will take?

I wonder if the battery will suffer from overheating eventually (the heat from GPU/CPU will accumulate inside the notebook and may damage energy cells).

Interesting stuff regardless.

Disastrous-Maybe2501@reddit (OP)

Good point. I have a 64GB memory M3 Max. Pre-training will take about a week. Now I've run it for a day and overheating seems to not be an issue yet. I'll continue to monitor

nekofneko@reddit

nice!