Has anyone gotten hold of DGX Spark for running local LLMs?
Posted by Chance-Studio-8242@reddit | LocalLLaMA | View on Reddit | 82 comments

DGX Spark is apparently one of the Time's Best Invention of 2025!
raphaelamorim@reddit
Performance review of it https://youtu.be/zs-J9sKxvoM?si=237f_mBVyLH7QBOE
Vozer_bros@reddit
I am quite sure Nvidia want to make an experiment on this device, to guild people enter Nvidia cuda world. But this product will NEVER catch the performance for a user compare to current server product.
For me, I do hope they drop some good shit that I can finally finetune all day.
MLisdabomb@reddit
I dont understand how you can win a product of the year for a product that hasn't been released yet. Nvidia has a pretty damn good marketing dept.
ilarp@reddit
haha TIME, a respected voice in the tech and AI space
atape_1@reddit
The people that made the decision probably never even heard of the AMD 395.
-dysangel-@reddit
or Macs. I was going to get a DIGITS, then saw the Mac Studio had more and faster RAM. I accepted the lack of proper CUDA and bit the bullet, and have been happy so far.
swagonflyyyy@reddit
Mac isn't too far behind in the AI game. Their performance is impressive despite lack of CUDA.
xrvz@reddit
Can buy a Mac. Can't buy a Dickits. Mac is not "not too far behind", but ahead.
swagonflyyyy@reddit
I'd say they can definitely be ahead in local AI on edge devices, and I totally expect them to go this route by next year.
ForsookComparison@reddit
The people that made the decision probably can't open a PDF
MitsotakiShogun@reddit
Or had different goals? Like: * CUDA support (so you can actually run (almost) everything, like Qwen3-Next) * Having a local system that's equivalent to a production machine * More compute, like ~5-10x more * Faster networking. Half the 395 systems don't even have 10 GbE, this one has a "tiny" bit more.
mattate@reddit
Qwen3 next had mlx support for macs on like day 3, just wanted to throw that in there, cuda at this level of personal ai compute is mot a necessity unless you are training
MitsotakiShogun@reddit
Fair enough. How about 395 and llamacpp?
jesus359_@reddit
User that bit the bullet as well and got a MacMiniM4 with 32GB. Its PRETTY good for consumer grade who is not training or being technical with them.
LMStudio and OpenWebUI with models 32B and down at Q4_M with 18t/s, I thinks its pretty good. OSS-20B and Qwen3-30B Coder/Thinking/Instruct are great and fly. Gemma 27B, MedGemma, Mistral/Magistral/Devstral are good.
Not to mention llamacpp with models lower than 7B and you got some pretty nice cases to get Qwen2.5VL or Gemma3:4B to read images and pass it onto oss-20B or Qwen3-30B.
MitsotakiShogun@reddit
Yeah, for sure the 395 is not a dedicated AI machine. The main reason I got the 395 was to use it as a general server (~20 docker stacks, ~30-40 containers) AND run a small model, which is why I do not regret it.
All I said in my initial comment was that CUDA (which comes with DGX Spark but not others) unlocks options that aren't available elsewhere. Qwen3-Next was a single example that got nitpicked (someone even replied in that comment chain completely forgetting that CUDA was my fist bullet point D:), but it's not the only thing. Even loading models in transformers (not even running inference), can cause you trouble. If I wanted a dedicated AI dev machine, DGX Spark is just better because of the software/hardware compatibility. If you want a machine that just runs most LLM at okay speeds, yeah, sure, go get a Mac or a 395.
CryptographerKlutzy7@reddit
If the 395 isn't an AI machine, then the spark REALLY isn't. Same memory bandwidth. So same inference speed.
MitsotakiShogun@reddit
There we go again... I literally commented about this a few comments up this specific chain: CUDA, compute, networking, architecture.
jesus359_@reddit
Ill agree with CUDA is king. It’s been proven time and time again because development was for CUDA. AMD and Intel has yet to make something by that competes with it. I know nothing about this machine though.
CryptographerKlutzy7@reddit
and yet NONE of that makes it more of an AI box than the Halo is.
That is the point.
Miserable-Dare5090@reddit
I see your point, but that assumes all that matters is bandwidth. The compute cores are insane in the dgx. You don’t have the same level of compute in the 395, which is the point above you.
If you consider AI machine to mean “runs AI models” then yes. If you consider it to mean “made for training machine learning tasks” then no, neither the mac chips or the AI ryzen chips are that.
CryptographerKlutzy7@reddit
But then again, neither REALLY is the spark, because again, the bandwidth hobbles it hard. It's not able to move the model to GPU memory to run it quickly, even in training the bandwidth restrictions are brutal.
You are looking for tasks which require LARGE amounts of memory, but require intense processing, on a tiny amount of it completely in isolation.
It's _really_ just Nvidia crippling their box on purpose so they can segment out their higher bandwidth solutions.
They were angling for it to come out before the halo, where people would accept a premium for it now, but they missed that window.
People keep saying "oh there are tasks which require more processor, which is TRUE, there is, but they also need _way_ more memory than what is on the chip, so regardless they are absolutely wrecked by the bandwidth restrictions.
You can't use the power it has for training, data clustering, inference, even regular ML work, natural language processing, etc.
It's been purposely crippled so it doesn't eat any of their more high prices offerings, but AMD doesn't care about that restriction so they could push something equal to it for 1/2 the price, without worrying about their market. They get to eat Nvidia's segments, not their own.
The Medusa will be even more brutal. The spark will be competing with boxes that have even more memory, and a lot more bandwidth. I think Nvidia has basically given up that space of the market, the same as how Intel did it with AMD back in the day, and for similar reasons.
sudochmod@reddit
Llamacpp works just fine and several in the community have gotten vLLM to work with ROCm wheels.
MitsotakiShogun@reddit
Read one more level up the comment chain. Or two.
Doesn't seem to be there yet: * https://github.com/ggml-org/llama.cpp/issues/15940 * https://github.com/vllm-project/vllm/issues/24944
sudochmod@reddit
Oh youre talking about qwen3 80b specifically for llamacpp.
I’m fairly certain kyuzo has a toolbox for vLLM on the strix. I can find it when I get on.
MitsotakiShogun@reddit
In the comment you replied to, yes, but the (relevant) bullet point from my first comment was saying that CUDA support is important if you want to do things other than simple LLM inference.
Maybe you want to load some model in transformers and alter how a pytorch module works, or you want to run software that only supports Nvidia GPUs (Nvidia Broadcast is pretty nice), or you want faster networking, or you want to develop software specifically for GH* architectures, or want to experiment with infiniband.
Generally, I'm just saying this machine has a place, a small niche that is not covered by Macs or the 395. It's just not an LLM inference server designed for r/localllama, and people here just shit on it for no good reason.
CryptographerKlutzy7@reddit
Not for no reason. The reason they shit on it is because Nvidia is playing silly buggers with market segmentation.
MitsotakiShogun@reddit
Yup, pretty sure nobody disagrees with that. It's the "this is useless" that I disagree with.
sudochmod@reddit
Yeah I agree it has a niche. I misunderstood what your previous comment was saying. I thought you were saying the 395 couldn’t do llamacpp and I was like “noooooooo it definitely can”.
All good:)
ThenExtension9196@reddit
Amd? For ai? Yeah, no thanks.
No_Understanding3856@reddit
An award sponsored this year, coincidentally, by Nvidia
/s
UltrMgns@reddit
Ah, it was their turn to kiss J's behind. I wonder who's next.
rm-rf-rm@reddit
FTFY. Lost all credibility when it became Benioff's mouthpiece
the320x200@reddit
I don't know about the Time case, maybe they're different, but many of the top X in Y "awards" are literally pay to win.
seppe0815@reddit
how much 4000 doller? nope thx
madaerodog@reddit
I am still waiting for mine, should be brought by end of october I was informed
Simusid@reddit
same. The email I got said I will have 4 days before the full release to purchase my pre-order. But still no actual date.
Kandect@reddit
Got an email to purchase one about an hour ago.
Simusid@reddit
I got the notification that I will get that too, I hope it's soon!!
Kandect@reddit
It kind of seems like they want it to be in people's hands by the 23rd. Its 4 days to complete purchase, 3-4 days before they ship it and 2 days shipping time.
Simusid@reddit
I'm seriously thinking of getting two :O
alew3@reddit
https://www.reddit.com/r/LocalLLaMA/s/lEQygecPcd
gwestr@reddit
Someone had one in the office. They estimated it is slower than an M4 MBP.
Republic-Appropriate@reddit
Giving an award for something that has not even been tested in the field yet. Whatever.
usernameplshere@reddit
They can even get endorsements for products that don't launch
Miserable-Dare5090@reddit
It’s more s device for devs to try CUDA friendly software before deploying to NVDIA blackwell chips in the GPU farm in the sky.
It won’t run inference faster than a mac or the 395, but it will have faster prompt processing.
It is technically (as shown in the price) a step down from the RTX pro 6000 workstation cards. Similar memory size, but the bandwidth is less than 400GB/s whereas the 6000 has something between 1500 and 1800 GB/s.
I would get one for finetuning and training, not inference or end user applications necessarily.
FootballRemote4595@reddit
The real value is that it's a development environment that is a series which scales up. So if you can run it on a spark you can run it on other dgx workloads.
Everyone wants to be able to work on dev and deploy on prod without things breaking.
Dgx spark 128gb unified RAM 1 Pflop fp4
DGX workstation 784 gb unified RAM 20 Pflop fp4
DGX h100 x8 640 GB vram 32 Pflop fp8
DGX superpod contains 32 units with x8 h100 20480 gb vram 640 Pflops fp8
The super pod is per rack and you can have multiple racks.
psilent@reddit
Yep, this is why my company wants them. We work directly with nvidia all the time and still can’t get them though.
AdDizzy8160@reddit
Many people underestimate the fact that with Spark, you get a machine that works out of the box for AI development (finetuning etc.).
In a business environment, the costs of setting it up are much higher than the difference to AMD.
More importantly, when a new paper (with Git) comes out, in most cases you can test it right away. With the others, you can either port it yourself (=costs) or wait (=time).
These are points where AMD needs to take a bit of a lesson and take these things more into its own hands and better support the dedicated community.
Miserable-Dare5090@reddit
But why the downvote?
AdDizzy8160@reddit
Downvote? Upvote!
DerFreudster@reddit
Shouldn't that be for best graphic of 2025? Best industrial design of something that doesn't exist? Has anyone seen any of these?
tshawkins@reddit
128gb 395s are the norm now, and I can see them either increasing in ram size or dropping in price over the next year or so. I'm getting ready to retire soon, and want a small box for running LLMs on so I'm not shelling out 200+ bucks a month for coding LLMs, so I will hang on untill the next gen before biting. At the moment grok-fast-code-1 is sufficing, but I'm not sure that will be around for ever.
tirolerben@reddit
As long as I can't order it and actually get it delivered, it's vaporware. And if we're already awarding vaporware "innovation awards," then I've just invented a portable fusion reactor that can power an entire house. You will be able to order it some day once I‘m in the mood.
Excellent_Produce146@reddit
https://forums.developer.nvidia.com/t/dgx-spark-release-updates/341703/103 - the first with a reservation on the marketplace were able to place their orders.
Shipment is expected around the 20th October 2025.
OpenAI has already some boxes and uses them for fine tuning (pre production models) as shown in a talk about their gpt-oss model series. They did fine tuning with Unsloth on the DGX Spark.
https://youtu.be/1HL2YHRj270?si=kaw5K4zOxHCad-It&t=1178
Edenar@reddit
I hope someone will get one so we'll see how it performs against 395 systems or 128gb macs.
But i don't think it's targeted at hobbyist like the amd machines. The arm CPU coupled with a small blackwell chip let me think it's a dev plateform for larger grace/blacwell cluster and nothing more. Maybe i'll be wrong but the price point also make it hard to justify.
torytyler@reddit
in the time I spent waiting for this I was able to build a 256GB DDR5 sapphire rapids server that has 96GB vram, and 2 more free pcie gen 5 slots for more expansion.
I know this device has its use cases, and low wattage performance is needed in some cases, but I'm glad I did more research and got more performance for my money! I was really excited when this device first dropped, the I realized it's not for me lol
Miserable-Dare5090@reddit
How did you get that much hardware for 4k? The 3090s alone would be half at least, and ram is way more expensive nowadays. Plus CPU, motherboard and ssd, power supply.
torytyler@reddit
I had the 4090 from my gaming pc, I use an engineering sample 112 thread QYFS, which has more memory bandwidth than the spark does (350gb/s) and it’s been VERY reliable so that was like $110. the motherboard was on sale, for $600 ASUS Sage, 256gb DDR5 was $1,000 and the 3090s for all three were $600 a piece. Reused my 1000w psu and grabbed another on Amazon for cheap, like $70…
The 3090s were a good deal. Two just has old thermal paste guy sold them as broken because loud fans… third one is an EVGA water cooled one with a god awful loud pump, but I fixed it with a magnet LOL all in all, it took a few months of getting all the pieces for cheap, but it’s doable!
Secure_Reflection409@reddit
Yeh, DDR4 prices are a pisstake now :(
alamacra@reddit
The "desktop AI supercomputer" claim is just so self contradictory... One would expect a "supercomputer" to be, well, superior to what a "computer" can do, but with their claim of one petaflop (5090 has 3.3 at fp4, which I presume is what they are using) it's a fine-tuning station at best. Just call it that.
MoffKalast@reddit
Once marketing people realized that words don't have to mean anything and that you can just straight up lie we reached rock bottom fairly quickly
Unlucky_Milk_4323@reddit
It's an overpriced ghost.
waiting_for_zban@reddit
Apple entered the chat. Then AMD. I just wonder how many stocks had Nvidia promised to the Times in return for this promo, for a device that hasn't even been launched yet.
Intelligent-Gift4519@reddit
Nvidia doesn't need to pay TIME. They just need to be the most valuable company in the world. TIME just sees "#1 biggest most valuable company that dominates all of AI is introducing a desktop."
Apple? All headlines are about "Apple fails at AI," right?
jesus359_@reddit
But Apple did fail at AI though. They keep promising it. They discontinued their AR Goggle Air to focus on competing with Meta/RayBan.
They fell off the wagon, Tim Apple is about to bounce, instead of coming up with new things they choose to be competitive and fell behind in doing so (first chip to fall was AirPower, then the Apple/Hyundai collab for the first AppleCar, came out with Goggles to compete with Oculus). Then they lost a bunch of people. Their worth right now is just what Apple used to be, not what it is now
waiting_for_zban@reddit
I am talking about AI hardware though. Like right now, if you look at the market competitors of Nvidia DGX Sparks, it's quite apparent it's not novel.
Apple has been building efficient and performant arm chips for exactly this purpose, with much higher shared memory, like the latest Mac Studio M3 Ultra with up to 512 GB of unified RAM. On paper this would blow the DGX sparks out of the water. MLX is quite decently supported too.
For 1 to 1 comparison, AMD has the Ryzen AI 395 on the market since Jan-Feb 2025, and has proven itself to be extremely capable in terms of value offering for that segment the DGX Sparks is aiming at, and at a competitive price.
So again it's baffling that the Times did minimal research. Even if you ask an LLM it would give you a better answer.
Miserable-Dare5090@reddit
I am saying this as someone who has an m2 ultra for AI. The mac chips will run AI, but they don’t train AI as fast or process the computational load as fast as nvidia silicon. It is not worth it to defend them. They are different use cases, after all.
Macs have the advantage of being able to run AI models within minutes of unboxing, whereas even AMD machines will need some setting up, possibly changing OS to linux, driver optimization, runtime optimization, etc. Macs are plug and play. That is a huge advantage to local AI.
But they’re not really competing with the core count in grace blackwell chips.
waiting_for_zban@reddit
I don't disagree, but the comparison sample here is DGX Sparks. I am not comparing the Mac Studio nor the Ryzen AI to Nvidia GPUs.
So I doubt it will be well suitable for training either (remember the memory bandwidth here is lower than that of the M2 Ultra even). The only thing going on for it, is cuda, and the 1PFlops FP4 Ai compute claim, which is yet to be seen in action, again bottlenecked by that 128GB of ram.
I am excited for it to hit the market, because more competition is good, it's just silly imo to make such claims by the time for an unreleased product.
CryptographerKlutzy7@reddit
By the time the spark lands medusa will be out, and it will have twice the memory and twice the bandwidth, and likely the same price as the Spark, Nvidia has lost the low end of the market with their insistence on segmentation.
The_Hardcard@reddit
You are right about Apple. All headlines are about Apple Intelligence, none about the ability of Mac Studios running huge open source models that the Nvidia and AMD consumer boxes can’t touch.
No headlines about the upcoming Studios with 4x the compute that will massively boost the prompt processing and long context performance in LLMs and image generation speed to go along with the already superior memory bandwidth.
Next summer, Apple will have the definitive boxes for local SOTA.
power97992@reddit
Im waiting for a 128 gb m5 or m6 max for less than 3200 usd… ( most likely will be 4700 or 4500 usd, but i can hope)…
256 gb m5 max and 384 gb m6 max will be crazy … the 2026 mac studio will have 1tb of unified ram….
Western-Source710@reddit
There's already a Studio with 1tb of memory. It's like $10k, though.
moofunk@reddit
It has 512 GB memory.
Western-Source710@reddit
I stand corrected. I blame medications for destroying my memory. My apologies, here's your upvote xo
VoidAlchemy@reddit
I heard a rumor that Wendell over at level1techs (YT channel and forums) might have something in the works about this. In the meantime he just reviewed the 128GB Minisforum MS S1 Max AI including a good discussion on the CPU vs GPU memory bandwidth and how it could be hooked to a discrete GPU for more power. Curious how these kinds of devices will pan out for home inferencing.
IngeniousIdiocy@reddit
Cheaper and with twice the gpu FLOPS (although weaker CPU) AND you can have one delivered in two days (in the continental US) are the Nvidia dev kits for their actual AI IoT chips.
Miserable-Dare5090@reddit
Sorry I am confused as to what developers kit you meant: - NVIDIA Jetson AGX Orin 64GB Developer Kit: 204Gbps bandwidth, 275 TOPS FP4 Versus - NVIDIA DGX Spark: 275 GBps bandwidth, 7 PFLOPS FP4
IngeniousIdiocy@reddit
The Jetson AGX Thor has 2 peta flops fp4 to the DGX Spark’s 1 peta flop… and only costs $3,500 although I just checked and they are on back order now. They were sitting in warehouses last month. It seems the back order is short with a target shipment date of November.
Turkino@reddit
I still don't get how it's a "best invention" when #1 - It's not a novel invention. #2 - It's not even out so how can it be a "best"?
Feels like it's a "pay to place" spot on this list.
ThenExtension9196@reddit
What’s funny is that this hardware was due like early summer lol
AdLumpy2758@reddit
At this point, it is a scam. Promised more than a year ago. I will order evo x2 next week. I need to run models now not in 2 years, maybe train some. For training just rent a100 for 1$ per hour!!! You can recreate gpt 3 for 10 bucks!)
No_Conversation9561@reddit
one the best invention that no one has tested yet