Fine-tuning llms on dgx spark from nvidia webpage

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments

https://blogs.nvidia.com/blog/rtx-ai-garage-fine-tuning-unsloth-dgx-spark/ Hi I'd like to discuss the numbers pertaining dgx spark performance from "How to Fine-Tune an LLM on Nvidia GPUs With Unsloth". ### Llama 3.3 70B - Method: Qlora - Backend: Pytorch - Config: - Sequence length: 2,048 - Batch size: 8 - Epoch: 1 - Steps: 125FP4 - Peak Tokens/ Sec: 5,079.04 If you assume training on 100M tokens then 100M/5079/3600 ~ 5.46 hours. It doesn't seem to bad for what is worth, to have a mini machine that could fine tune a llama 3.3 70b in qlora. Is there a catch? Is this realistic number?