Fine-tuning llms on dgx spark from nvidia webpage

Posted by siegevjorn@reddit | LocalLLaMA | View on Reddit | 2 comments

https://blogs.nvidia.com/blog/rtx-ai-garage-fine-tuning-unsloth-dgx-spark/ Hi I'd like to discuss the numbers pertaining dgx spark performance from "How to Fine-Tune an LLM on Nvidia GPUs With Unsloth". ### Llama 3.3 70B - Method: Qlora - Backend: Pytorch - Config: - Sequence length: 2,048 - Batch size: 8 - Epoch: 1 - Steps: 125FP4 - Peak Tokens/ Sec: 5,079.04 If you assume training on 100M tokens then 100M/5079/3600 ~ 5.46 hours. It doesn't seem to bad for what is worth, to have a mini machine that could fine tune a llama 3.3 70b in qlora. Is there a catch? Is this realistic number?

2 Comments

[-]

Extension-Bass-2338@reddit

Those numbers look legit for QLoRA on decent hardware. The DGX Spark has H200s so that throughput makes sense The "catch" is probably the usual suspects - memory constraints if you want longer sequences, and QLoRA obviously isn't full fine-tuning so quality might not be as good depending on your use case. But for most people that speed is pretty solid

Tyme4Trouble@reddit

DGX Spark has the equivalent of a 5070 in terms of compute.

Reply to Post

2 Comments