GPT2 using MLX
Posted by Disastrous-Maybe2501@reddit | LocalLLaMA | View on Reddit | 3 comments
Hi all, I was learning LLM pre-training from Andrej Karpathy's NanoGPT and decided to try it out using MLX. I originally thought it would be more or less a simple translation from PyTorch to MLX, but it turned out to be much more tricky than that. I published my code and documented my learnings in a blog post included in the repo. I'll kick off full training on fineweb on my M3 Max and will be publishing the training results to the repo once I have that. Any thoughts and feedback are welcome, here or directly on the repo. Thanks!
Gregory-Wolf@reddit
Which M3 Max do you have? 128Gb MBP? How much time do you think pretraining will take?
I wonder if the battery will suffer from overheating eventually (the heat from GPU/CPU will accumulate inside the notebook and may damage energy cells).
Interesting stuff regardless.
Disastrous-Maybe2501@reddit (OP)
Good point. I have a 64GB memory M3 Max. Pre-training will take about a week. Now I've run it for a day and overheating seems to not be an issue yet. I'll continue to monitor
nekofneko@reddit
nice!