DGX Spark: an unpopular opinion

[-]

supahl33t@reddit

So I'm in a similar situation and could use some opinions. I'm working on my doctorate and my research is similar to yours, I have the budget for a dual 5090 system (already have one 5090FE) but would it be better to go dual 5090s or two of these DGX workstations?

[-]

Fit-Outside7976@reddit

What is more important for your research? Inference performance, compute power, or total VRAM? Dual 5090s win on compute power and inference performance. Total VRAM is the DGX GB10 systems.

Personally, I saw more value in the total VRAM. I have two ASUS Ascent GB10 systems clustered running my lab. I use them for some inference workloads (generating synthetic data), but mainly prototyping language model architectures / model optimization. If you have any questions, I'd be happy to answer.

[-]

Chance-Studio-8242@reddit

If I am interested mostly in tasks that involve getting embeddings of millions of sentences in big corpora using models such google's embedding-gemma or even larger Qwen or Nemotron models, is DGX Spark PP/TG speed okay for such a task?

[-]

Caligol@reddit

Hey did you get an answer for this? I'm interested in the same too

[-]

supahl33t@reddit

I'll DM you in the morning if you don't mind. Thank you!

[-]

Toshodin@reddit

Concurrency is it's super power

https://dendro-logic.com/engineering/nvidia-dgx-spark-concurrency-benchmark/

[-]

JackCPiano@reddit

This machine is Nvidia's version 1.00 on an affordable all-in-one computer that has GPU, CPU, storage, power and most of all unified memory. On top of that they are virtually creating an operating system of their own which is no small feat... So teething problems are probably expected but this is the directikn Nvidiaust go to stay competitive... Apples M5 and future chips will pose real competition to NVIDIA. They offer unified memory, the GPU and CPU consume about an eighth of the power compared to their NVIDIA counterpart. Not to mention you don't need to buy expensive servers from DELL, HP, et al to house those NVIDIA GPUs.

[-]

Xigongda@reddit

How does it compare to a similarly priced Mac Studio?

[-]

CatalyticDragon@reddit

That's probably the intended use case. I think the criticisms are mostly valid and tend to be :

It's not a petaflop class "supercomputer"
It's twice the price of alternatives which largely do the same thing
It's slower than a similarly priced Mac

If the marketing had simply been "here's a GB200 devkit" nobody would have batted an eyelid.

[-]

blue_eyes_pro_dragon@reddit

Do you have any alternatives? I’d love for $2.5k but can’t find any

[-]

CatalyticDragon@reddit

Sadly $2.5k is probably not on the cards at the moment so the closest alternative might be something like this:

- https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6

It lacks native FP4 support and the 200Gbs NIC but also doesn't cost $4,699.

Sill pretty fun and surprisingly capable.

[-]

blue_eyes_pro_dragon@reddit

Thank you

[-]

SashaUsesReddit@reddit

I do agree; the marketing is wrong. The system is a GB200 dev kit essentially... but nvidia also made a separate GB dev kit machine for \~$90k

Dell Pro Max AI Desktop PCs with NVIDIA Blackwell GPUs | Dell USA

[-]

lambdawaves@reddit

Did you know Asus sells a DGX spark for $1000 cheaper? Try it out!

[-]

flink33@reddit

MSI also has one $1000 cheaper than DGX spark with 4tb. same spec, but better cooling I think.

[-]

blue_eyes_pro_dragon@reddit

Do you have a link?

[-]

ecnecn@reddit

It uses the same chip but the Asus one has no acces not the whole original plattform..

[-]

Standard_Property237@reddit

That’s only for the 1TB storage config. It’s clever marketing on the part of Asus but they prices are nearly identical

[-]

lambdawaves@reddit

So you save $1000 dropping from 4TB SSD to 1TB SSD? I think that’s a worthwhile downgrade for most people especially since it supports USB4 (40Gbps)

[-]

Miserable-Dare5090@reddit

usb 3.2 gen2x2 only == 20gbps

[-]

Standard_Property237@reddit

Yeah seems like a no-brainer trade off. Just spend $1000 less and the spend a couple hundred on a BUNCH of external storage

[-]

Fit-Outside7976@reddit

Can confirm. I have a 48TB DAS connected via USB4

[-]

here_n_dere@reddit

Wondering if ASUS can pair with an NVidia DGX spark through C2C?

[-]

Standard_Property237@reddit

I imagine you could, it’s the same hardware

[-]

Professional_Mix2418@reddit

It is a different configuration. I looked, I paid with my own money for one. Naturally I was attracted by the headlines. But if you use the additional storage, and like it low maintenance within the single box, there is no material price difference.

[-]

Igot1forya@reddit

I love mine. Just one slight mod...

[-]

blue_eyes_pro_dragon@reddit

Why?

[-]

tired_fella@reddit

Wonder if you can use something like Noctua 90mm fans.

[-]

MoffKalast@reddit

Any reduction in that trashy gold finish is a win imo, this thing would not look out of place in the oval office lavatory.

[-]

Igot1forya@reddit

I've never cared about looks, it's always function over form. I hate RGB or anything flashy.

[-]

ANTIVNTIANTI@reddit

same🙂

[-]

v01dm4n@reddit

There are always other vendors.

[-]

gotaroundtoit2020@reddit

Is the Spark thermal throttling or do you just like to run things cooler?

[-]

Igot1forya@reddit

I have done this to every GPU I've owned, added additional cooling to allow the device to remain in boost longer. Seeing the reviews of the other Sparks out there one theme kept pooping up, Nvidia priority was on silent operation and the benchmarks placed it dead last vs the other (cheaper) variants.

The reviewers said that the RAM will throttle at 85C, while I've never hit this temp (81C was my top), the Spark remains extremely high. Adding the fans has dropped the temps by 5C. My brother has a CNC machine and I'm thinking about milling out the top and adding a solid copper chimney with a fin stack.:)

[-]

thehoffau@reddit

ITS WHISPER QUIET!!!

[-]

Infninfn@reddit

I can hear it from here

[-]

Igot1forya@reddit

It's actually silent. The fans are just USB powered. I do have actual server fans I thought about putting on there, though lol

[-]

Infninfn@reddit

Ah. For a minute I thought your workspace was a mandatory ANC headphone zone.

[-]

Igot1forya@reddit

It could be the Spark is living on top of my QNAP which is on top of my server rack in a server closet just off my home office.

[-]

pineapplekiwipen@reddit

I mean that's its intended use so it makes sense that you are finding it useful. But it's funny you're comparing it to 5090 here as it's even slower than a 3090.

[-]

Better_Dress_8508@reddit

I question this assessment. If you want to build a system with 4 3090s your total cost will come close to the price of a DGX (i.e., motherboard, PSU, memory, risers, etc.)

[-]

MathematicianLow3628@reddit

You forgetting that's still under 100gb vs 128gb and way cheaper to run considering power consumption.

[-]

SashaUsesReddit@reddit

I use sparks for research also.. It also comes down to more than just raw flops vs 3090 etc... 5090 can support nvfp4; a place where a lot of research is taking place for scaling in future (although he didn't specifically call out his cloud resources supporting that)

Also, this preps workloads for larger clusters on the Grace Blackwell aarch64 setup.

I use my spark cluster for software validation and runs before I go and spend a bunch of hours on REAL training hardware etc

[-]

Electrical_Heart_207@reddit

Interesting use of Spark for validation. When you're testing on 'real' training hardware, how do you typically provision that? Curious about your workflow from local dev to actual GPU runs.

[-]

pineapplekiwipen@reddit

That's all correct. And I'm well aware that one of DGX Spark's selling points is its FP4 support, but the way he brought up performance made it seem like DGX spark was only slightly less powerful than a 5090 when it fact it's like 3-4 times less powerful in raw compute and also severely bottlenecked by ram bandwidth.

[-]

SashaUsesReddit@reddit

Very true and fair

[-]

dtdisapointingresult@reddit

Four 3090s will beat a single DGX spark at both price and performance

Will they?

Where I am 4 used 3090 are almost the same price as 1 new DGX Spark
you need a new mobo to fit 4 cards, new case, new PSU, so really it's more expensive
You will spend a fortune in electricity on the 3090s
You only get 96GB VRAM vs DGX's 128GB
For models that don't fit on a single GPU (ie the reason you want lots of VRAM in the first place) I suspect the speed will be just as bad as DGX if not worse, due to all all the traffic

If someone here has 4 3090s willing to test some theories, I got access to a DGX Spark and can post benchmarks.

[-]

ItsZerone@reddit

In what world are you building a quad 3090 rig for under 4k usd in this market?

[-]

v01dm4n@reddit

A youtuber has done this for us. Here you go.

[-]

Professional_Mix2418@reddit

Indeed, and then you have the space requirements, the noise, the tweaking, the heat, the electricity. Nope give me my little DGX Spark any day.

[-]

KontoOficjalneMR@reddit

For models that don't fit on a single GPU (ie the reason you want lots of VRAM in the first place) I suspect the speed will be just as bad as DGX if not worse, due to all all the traffic

For inference you're wrong, the speed will still be pretty much the same as with a single card.

[-]

dtdisapointingresult@reddit

My bad, speed goes up, but it's not much. I just remembered this post where 1x 4090 vs 2x 4090 only meant going from 19.01 to 21.89 tok/sec faster inference.

https://www.reddit.com/r/LocalLLaMA/comments/1pn2e1c/llamacpp_automation_for_gpu_layers_tensor_split/nu5hkdh/

[-]

Pure_Anthropy@reddit

For training it will depend on the motherboard and the amount of offloading you do and the type of model you train. You can stream the model asynchronously while doing the compute. For image diffusion model I can fine-tune a image diffusion model 2 times bigger than my 3090 with a 5/10% speed decrease.

[-]

Ill_Recipe7620@reddit

The benefit of the DGX Spark is the massive memory bandwidth. A 3090 (or even 4) will not beat DGX Spark on applications where memory is moving between CPU/GPU like CFD (Star-CCM+) or FEA. NVDA made a mistake marketing it as a 'desktop AI inference supercomputer'. That's not even its best use-case.

[-]

FirstOrderCat@reddit

Do large moe models require lots of bandwidth for inference?

[-]

v01dm4n@reddit

They need high internal gpu-mem bandwidth.

[-]

Electrical_Heart_207@reddit

Interesting take on the DGX Spark. What's driving your hardware decisions these days - cost, availability, or something else?

[-]

DarqOnReddit@reddit

bot reply

[-]

FullstackSensei@reddit

You are precisely one of the principal target demographies the Spark was designed for, despite so many in this community thinking otherwise.

Nvidia designed the Spark to hook up people like you on CUDA early and get you into the ecosystem at a relatively low cost for your university/institution. Once you're in the ecosystem, the only way forward is with bigger clusters of more expensive GPUs.

[-]

advo_k_at@reddit

My impression was they offer cloud stuff that’s supposed to run seamlessly with whatever you do on the spark locally - I doubt their audience are in a market for a self hosted cluster

[-]

FullstackSensei@reddit

Huang plans far longer into the future than most people realize. He sank literally billions into CUDA for a good 15 years before anyone had any idea what it is or what it does, thinking that: if you build it, they will come.

While he's milking the AI bubble to the maximum, he's not stupid and he's planning how to keep Nvidia's position in academia and industry after the AI bubble bursts. The hyoerscalers' market is getting a lot more competitive, and he knows once the AI bubble pops, his traditional customers will go back to being the bread and butter of Nvidia: universities, research institutions, HPC centers, financial institutions, and everyone who runs small clusters. None of those have any interest in moving to the cloud.

[-]

Technical_Ad_440@reddit

can you hook 2 of them together and get good speed from them? if you can hook 2 or 3 then they are really good price for what they are 4 would give 256gb vram. and hopefully they make AI stuff for us guys we want AI to i want all my things local and i also want eventual agi local and in a robot to. i would love a 1tb vram model that can actually run the big llms.

am also looking for ai builds that can do video and image to. ive noticed that "big" things like this are mainly for text llms

[-]

FullstackSensei@reddit

Simply put, you're not the target audience for the spark and you'll be much better off with good old PCIe GPUs.

[-]

Glum-Ad3404@reddit

Does having two Sparks speed up inference?

I think the use case is varied. You aren’t running 80-120B models locally on a 5090. You could run x8/x8 on mb with two 5090’s then you are over the cost of two Sparks. I don’t know if’s good local LLM sits in the space people are describing, nor do I think they want to pay the utilizes of doing such. I think the Spark is the gateway to the machine people want, maybe that’s gen 2 when they can increase bandwidth and give you 128-512gb of vram through SOCAMM/2. What is most important to you token speed or model ability? Maybe you want to run multiple models of various sizes for different applications, this would be another use case. A high powered local inference machine on a decent sized model for $ 4-4600 doesn’t exist.

[-]

Wolvenmoon@reddit

I just want Spark pricing for 512GB of RAM and 'good enough' inference to run for a single person to develop models on. :'D

[-]

Technical_Ad_440@reddit

hmm i'll look at just gpus then hopefully the big ones drop in price relatively soon. there is so many different big high end ones its annoying to try and keep up with what's good and such whats the big server gpu and the low end server gpus.

[-]

0xd34db347@reddit

Chill with the glazing, CUDA was a selling point for cryptocurrency mining before anyone here had ever heard of a tensor, it was not some visionary moonshot project.

[-]

Standard_Property237@reddit

the real goal NVIDIA has with this box from an inference standpoint is to get you using more GPUs from their Lepton marketplace or their DGX cloud. The DGX and the variants of it from other OEMs really are aimed at development (not pretraining) and finetuning. If you take that at face value it’s a great little box and you don’t necessarily have to feel discouraged

[-]

Comrade-Porcupine@reddit

Exactly this and it looks like a relatively compelling product and I was thinking of getting one for myself as an "entrance" to kick my ass into doing this kind of work.

Then I saw Jensen Huang interviewed about AI and the US military and defense tech and I was like...

Nah.

[-]

MoffKalast@reddit

It's "the first one's free, kid" of Nvidia.

[-]

Kwigg@reddit

I don't actually think that's an unpopular opinion here. It's great for giving you a giant pile of VRAM and is very powerful for it's power usage. It's just not what we were hoping for due to its disappointing memory bandwidth for the cost - most of us here are running LLM inference, not training, and that's one task it's quite mediocre with.

[-]

Lucky7142857@reddit

From your opinion, what would be the best beast for running local LLM inference? I am thinking a Mac studio M3 Ultra but I would prefer a ECC RAM if I want to build a serious product for corporates.

[-]

Kwigg@reddit

The current best is stacking as many RTX Pro 6000s as you can in a machine. ECC RAM is sort of irrelevant for inference because you don't do it on the CPU.

Macs are very powerful inference machines but have bad prompt processing. The new M5 promises to change that, but we don't have the M5 Ultra yet.

[-]

Officer_Trevor_Cory@reddit

my beef with Spark is that it only has 128GB of memory. it's really not that much for the price

[-]

EvilPencil@reddit

And now it’s a bargain for the price…

[-]

pm_me_github_repos@reddit

I think the problem was it got sucked up by the AI wave and people were hoping for some local inference server when the *GX lineup has never been about that. It’s always been a lightweight dev kit for the latest architecture intended for R&D before you deploy on real GPUs.

[-]

bigh-aus@reddit

I look forward to when these come on the secondary market after The Mac m5 ultra comes out, and people just wanting inference sell the spark and buy them instead.

[-]

IShitMyselfNow@reddit

Nvidias announcement and marketing bullshit kinda implies it's gonna be great for anything AI.

https://nvidianews.nvidia.com/news/nvidia-announces-dgx-spark-and-dgx-station-personal-ai-computers

to prototype, fine-tune and inference large models on desktops

delivering up to 1,000 trillion operations per second of AI compute for fine-tuning and inference with the latest AI reasoning models,

The GB10 Superchip uses NVIDIA NVLink™-C2C interconnect technology to deliver a CPU+GPU-coherent memory model with 5x the bandwidth of fifth-generation PCIe. This lets the superchip access data between a GPU and CPU to optimize performance for memory-intensive AI developer workloads.

I mean it's marketing so of course it's bullshit, but 5x the bandwidth of fifth-generation PCIe sounds a lot better than what it actually ended up being.

[-]

DataGOGO@reddit

All of that is true, and is exactly what it does, but the very first sentence tells you exactly who and what it is designed for:

Development and prototyping.

[-]

powerfulparadox@reddit

And yet there's that pesky word "inference" in the same sentence.

[-]

DataGOGO@reddit

Yes, as part of development and prototyping.

Buying a spark to run a local LLM is like buying a lawn mower to trim the hedges.

[-]

powerfulparadox@reddit

Fair. But that list could be interpreted as a list of use cases rather than a single use case described with three aspects of said use case.

Of course, we'd all be living in a much better world if most people learned and applied the skill of looking past the marketing/hype and actually paying attention to all the relevant information that might keep them from disappointment and wasted time and money.

[-]

Sorry_Ad191@reddit

but you can't really prototype anything that will run on Hopper sm90 or Enterprise Blackwell sm100 since the architectures are completely different? sm100 the datacenter blackwell card has tmem and other fancy stuff that these completely lack so I don't understand the argument for prototyping when the kernels are not even compatible?

[-]

PostArchitekt@reddit

This where the Jetson Thor fills the gap in the product line. As it just needs tuning for memory and core logic for something like a B200 but it’s the same architecture. A current client need plus one of the many reasons why I grabbed one for 20% discount going on for the holidays. A great deal considering the current RAM prices as well.

[-]

Mythril_Zombie@reddit

Not all programs are run on those platforms.
I prototype apps on Linux that talk to a different Jetson box. When they're ready for prime time, I spin up runpod with the expensive stuff.

[-]

Cane_P@reddit

That's the speed between the CPU and GPU. We have [Memory]-[CPU]=[GPU], where "=" is the 5x bandwidth of PCIe. It still needs to go through the CPU to access memory and that bus is slow as we know.

I for one, really hoped that the memory bandwidth would be closer to the desktop GPU speed or just below it. So more like 500GB/s or better. We can always hope for a second generation with SOCAMM memory. NVIDIA apparently dropped the first generation and is already at SOCAMM2, and it is now a JEDEC standard, instead of a custom project.

[-]

Hedede@reddit

But we knew that it'll be LPDDR5X with 256-bit bus from the beginning.

[-]

Cane_P@reddit

Not when I first heard rumors about the product... Obviously we don't have the same sources. Because the only thing that was known when I found out about it, was that it was an ARM based system with an NVIDIA GPU. Then months later, I found out the tentative performance, but still no details. It was about half a year before the details got known.

[-]

BeginningReveal2620@reddit

NGREEDIA - Miking everyone.

[-]

emprahsFury@reddit

nvidia absolutely marketed it as a better 5090. THe knock-off h100 was always second fiddle to the blackwell gpu, but with 5x the ram

[-]

florinandrei@reddit

I don't actually think that's an unpopular opinion here.

It's quite unpopular with the folks who don't understand the difference between inference and development.

They might be a minority - but, if so, it's a very vocal one.

Welcome to social media.

[-]

DataGOGO@reddit

The Spark is not designed or intended for people to just be running local inference

[-]

Novel-Mechanic3448@reddit

It's not vram

[-]

-dysangel-@reddit

it's not not vram

[-]

Late-Assignment8482@reddit

I actually ended up getting two of the Lenovo units. Loving them. Trying to talk myself out of a third (that's the max you can wire together without a switch).

And I'm doing primarily inference right now but want to do some image and video gen soon.

I just don't *need* high tokens/second. For what I do, being able to load Qwen3-VL-235B into vLLM with two 256k context streams is a quantum leap. It's getting 20 tok/s generation, on average. Would 3x Blackwells blow it away on tok/s? Sure. Or it'd pay off my car loan!

Keep in mind that's human reading speed, plus a smidge, for two simultaneous chats, two reviews of 10+ chapters of draft, 40,000 line codebases...for less than I'd have paid for one Pro 6000.

And when I want to generate video or do fine tuning, I'm not frantically cramming things into >32GB memory.

For the patient, it's pretty unbeatable, TBH.

[-]

mycall@reddit

That was a fun read. I was doing napkin math and know should, with using parallelism and vLLM, be able to run 16 concurrent 10tok/s sessions. Have you tried that before?

[-]

AdTop2399@reddit

Thoughts on agentic workflows on spark os v/s on Mac OS? Anyone got first-hand experience w multiple agents across several devices in a cluster?

[-]

bigtom_x@reddit

The issue I see is there is very little info on how the Spark actually works for model training. Every influencer Nvidia sent a unit to has been doing inference. That’s ok and it is a benchmark people should know about, but what are the real workflows and advantages for model training and fine tuning with the Spark.

The memory bandwidth could have been a little better, but I understand the power limitation targets. Why haven’t we seen water cooled units yet?

The price is fair considering all the documentation it comes with and the software. If you want to actually learn how to work in the enterprise AI space, it’s a great tool for that.

[-]

bigtom_x@reddit

The issue I see is there is very little info on how the Spark actually works for model training. Every influencer Nvidia sent a unit to has been doing inference. That’s ok and it is a benchmark people should know about, but what are the real workflows and advantages for model training and fine tuning with the Spark. The memory bandwidth could have been a little better, but I understand the power limitation targets. Why haven’t we seen water cooled units yet? The price is fair considering all the documentation it comes with and the software. If you want to actually learn how to work in the enterprise AI space, it’s a great tool for that.

[-]

Expensive-Paint-9490@reddit

The simple issue is: with 273 GB/s bandwidth, a 100 GB model will generate 2.5 token/second. This is not going to be usable for 99% of use cases. To get acceptable speeds you must limit model size to >= 25 GB, and at that point an RTX 5090 is immensely superior in every regard, at the same price point.

For the 1% niche that has an actual use for 128 GB at 273 GB/s it's a good option. But niche, as I said.

[-]

No-Working7460@reddit

Wouldn't a single RTX 5090 only have 32GB of memory and hence OOM on even fairly small models?

[-]

Expensive-Paint-9490@reddit

My point is: anything above 30GB in size is going to be very slow on Spark. Then, if you are going to run a model using less than 30GB size (weights + context), why buying a Spark at all? You can fit those models in a single 5090 (not OOM) and get hugely better performance.

[-]

No-Working7460@reddit

Yes you're right, makes perfect sense. I am now starting to wonder what exactly would be uses cases where DGX brings value even to a hobbyst (2.5 tokens/second is really slow).

[-]

Historical-Internal3@reddit

Dense models run slow(ish). MoEs are just fine.

I’m at about 60 tokens/second with GPT OSS 120b using SGLang.

Get about 50ish using LM Studio.

[-]

GPTrack--dot--ai@reddit

Terrible fake ad.

[-]

GPTshop--dot--ai@reddit

obviously just Nvidia advertising.

[-]

modzer0@reddit

That's exactly what it's supposed to be used for. Research and development for people with access to larger DGX clusters. It was never meant to be a pure inference machine. Quantizing and tuning are the areas where it really shines. You develop on the Spark and you deploy to a larger system without having to change code because of the common hardware and toolbase.

Mine has paid for itself many times over just from not having to use cloud instances for work that really doesn't need the full power of those systems until I actually deploy it to production.

Much of the hate comes from people who assume it's overpriced trash because it's not a super inference machine. It was never designed to be one. It's for people to use so they don't have to do development work on expensive production grade systems like the B200s yet allows them to deploy their work to those systems easily.

[-]

ipepe@reddit

Hey. I'm a web dev interested in AI. What kind of job is that? What kind of companies are using these kind of technologies?

[-]

devshore@reddit

Isnt this more expensive and yet slower than the apple-sillicon options?

[-]

ItsZerone@reddit

That depends on what you're trying to do.

[-]

korino11@reddit

DGX - useles shit... Idiots only can buy that shit.

[-]

Regular-Forever5876@reddit

it is not even comparable.. writing code for Mac is writing code for 10% desktop user and practically 0% of the servers in the world.

Unless for personal usage, it is totally useless and worthless the time spent doing it for research. It has no meaning.at all.

Because inference idiots (only to quote your dictionary of expressiveness) are simple PARASITES that exploit the work of others without ever contributing it back... yeah, let them buy a Mac, while real researcher do the heavy lifting on really usefull scalable architecture where the Spark is the smallest easiest available device to start devwlopping and scaling IP afterwards.

[-]

ANTIVNTIANTI@reddit

you high homie?

[-]

Regular-Forever5876@reddit

No but apparently reddit users are allergic to sarsasm and trurhful statement..

OSx is roifhrly 13% desktop wprld wide: https://gs.statcounter.com/os-market-share/desktop/worldwide

And LESS THEN 0.01% API or internet is served by Apple servers; https://w3techs.com/technologies/details/os-macos

[-]

ANTIVNTIANTI@reddit

lol! true! 😜😂😂 much love! I apologize, my humor sucks//I swear there’s more in my head that i don’t end up writing but like, i assume, in that moment—that i had? i don’t know how to explain it lol, COVID brain fog completely ruined me lol 😅😂😟😕😣😖😫😭😭😭😭😭

[-]

GPTshop@reddit

right!!!

[-]

Sl33py_4est@reddit

I bought one for shits and gigs, and I think its great. it makes my ears bleed tho

[-]

Regular-Forever5876@reddit

Not sire you have one for real... the Spark is PURE SILENCE, I've never heard a mini workstation who was that quiet... 😅

[-]

Sl33py_4est@reddit

google "dgx spark high pitched whirring"

[-]

Regular-Forever5876@reddit

I dont have to because my DGX is literally sitting here next to my keyboard. But I did that and gave me 0 perfect match.

DGX is one of the most silent unit I actually ever had. If your unit is whining, that's a defective unit and you should ask for repair or replacement.

I got 3 DGX and one was defective, NVIDIA replaced it no questions asked: the SSD simply stopped working one day without prior notice. The two other units are perfectly fine.

[-]

Sl33py_4est@reddit

nice

[-]

ellyarroway@reddit

I mean you need to get people started to fix the bugs on arm cuda, without having to own or rent $50000 GH200 or half million GB300. Working on GH200 for two years the ecosystem pain is real.

[-]

TensorSpeed@reddit

Anytime there's a discussion about it the conclusion is the same:
Bad if you expect inference performance, but good for developers and those doing training.

[-]

FormalAd7367@reddit

For the money, i’d rather get a used Rig.. if i need update of ram or gpu, i can just get some from ebay

[-]

The_Paradoxy@reddit

What are memory bandwidth and latency like? Branch prediction? I'm more interested in how it completes with an AMD 300A or 300C than anything else

[-]

Baldur-Norddahl@reddit

But why not just get a RTX 6000 Pro instead? Almost as much memory and much faster.

[-]

Alive_Ad_3223@reddit

Money bro .

[-]

SashaUsesReddit@reddit

Lol why not spend 3x or more

The GPU is 2x the price of the whole system, then you need a separate system to install to, then higher power use and still less memory if you really need the 128GB

[-]

NeverEnPassant@reddit

Edu rtx 6000 pros are like $7k.

[-]

SashaUsesReddit@reddit

ok... so still 2x+ what EDU spark is? Plus system and power? Plus maybe needing two for workload?

[-]

NeverEnPassant@reddit

The rest of the system can be built for $1k, then the price is 2x and the utility is way higher.

[-]

SashaUsesReddit@reddit

No... it can't.

Try building actual software like vllm with only whatever system and ram come for $1k.

It would take you forever.

Good dev platforms are a lot more than one PCIe slot.

[-]

NeverEnPassant@reddit

You mention vllm, so if we are talking just inference, a 5090 + DDR5-6000 shits all over the spark for less money. Yes, even for models that don't fit in VRAM.

This user was specifically talking about training. And I'm not sure what you think VLLM needs. The spark is a very weak system outside of RAM.

[-]

SashaUsesReddit@reddit

I was referencing building software. Vllm is an example as it's commonly used for RL training workloads.

Have fun with whatever you're working through

[-]

NeverEnPassant@reddit

You words have converged into nonsense. I'm guessing you bought a Spark and are trying to justify your purchase so you don't feel bad.

[-]

SashaUsesReddit@reddit

Let's run some tests then. I have 5090s, 6000s, B200, B300, sparks etc.

Let's settle it with data. Your inf only arguments with only llama cpp experience is daft

[-]

NeverEnPassant@reddit

Feel free to explain what you think a $1k system + rtx 6000 pro might be lacking that would not be a problem on a Spark (other than a 32GB memory difference).

[-]

SashaUsesReddit@reddit

Sent you a DM:

I think we got off to the wrong foot on that thread. I'd love to actually break down the use cases and provide useful data back to the community. I have also had a couple glasses of scotch tonight so it evidently makes my reddit comments more sassy.

My apologies!

I run large training and inference workloads across several hundred thousand GPUs and would love to see what inflection points work.

Thoughts?

Posting same comment to the thread for transparency

[-]

NeverEnPassant@reddit

Main character syndrome much?

[-]

SashaUsesReddit@reddit

.....what?

I apologized and then proposed we work on data together?

[-]

NeverEnPassant@reddit

You have:

Flexed your credentials and hardware collection.
Talk as if you see yourself in some kind of mentor relationship.
You think you can be rude and abrasive so long as you want, until you don't want to any longer and everyone else must turn on a dime.
Not answered a very basic question central to your claims.
Put on some weird public show about sending a DM and also posting it in the thread.

You are really toxic.

[-]

SashaUsesReddit@reddit

Yeah man.. that's a take.

I posted and DM'd so we could chat and also not be an asshole that just DMs without apologizing on a public thread for having a bad attitude as per my 'sassy' responses when I had some scotch etc as stated. It's not a public show, it was an aim to connect with you and also take public accountability? Just a DM would be weirder?

I'm not here to mentor anyone. I try to share my experiences since I do this for a living at a huge scale. Building and deploying models. I contribute to the libraries everyone here uses in a large way, so I want to chime in.

What basic question didn't I answer? I stated we should test throughput on various configs outside of a random llama.cpp experience you have.

It's not my aim to be abrasive, as is why I wanted to start over with you and be collaborative.

Don't turn on a dime, but I hardly see how you have to "turn on a dime" when the relationship is a few reddit comments long lol. Let's grow up.

[-]

Mythril_Zombie@reddit

You seem to want to complain about it to make yourself feel better about it not being some miracle box of cheap, fast, local inference to rival data centers.
Because unless it could do that, you guys are never going to stop being angry that they made this thing.

[-]

NeverEnPassant@reddit

rtx 6000 pro is 2x the cost and 6-7x the performance

it's a shit product

[-]

Professional_Mix2418@reddit

You are clearly not the target audience. This isnt' for consumers, this is for professionals.

[-]

NeverEnPassant@reddit

So is the rtx 6000 pro. I know because it has “pro” in the name. Except it has 6-7x more performance for 2x the cost.

[-]

Baldur-Norddahl@reddit

Surely the university already has a PC they can use for the card.

[-]

Professional_Mix2418@reddit

Then one also has to get a computer around it, store it, power it, deal with the noise, the heat. And by the time the costs are added for a suitable PC, it is a heck of a lot more expensive. Have you seen the prices of RAM these days...The current batch of DGX Spark was done on the old price, the next won't be as cheap...

Nope I've got mine nicely tucked underneath my monitor. Silent, golden, and sips power.

[-]

SignificantDress355@reddit

I totally get you from like a research perspective :)

Still i dont linke it because of:

-Price -Bandwidth -Connectivity

[-]

PeakBrave8235@reddit

Eh. A Mac is simply better for the people the DGX Spark is mostly targeting.

[-]

Fit-Outside7976@reddit

Has anyone gotten training going on mac studios?

[-]

ANTIVNTIANTI@reddit

not yet but i can, not sure why PC peeps never learn about Mac’s, we Mac folk definitely know pc shit, most of us like, ya know, started with Pcs lol😜

[-]

PeakBrave8235@reddit

You can do overnight fine tuning.

[-]

onethousandmonkey@reddit

Tbh there is a lot of (unwarranted) criticism around here about anything but custom built rigs.

DHX Spark def has a place! So does the Mac.

[-]

aimark42@reddit

What if you could use both?

https://blog.exolabs.net/nvidia-dgx-spark/

I'm working on building this cluster.

[-]

Slimxshadyx@reddit

Reading through the post right now and it is a very good article. Did you write this?

[-]

aimark42@reddit

I'm not that smart, but I am waiting for a Mac Studio to be delivered so I can try this out. I'm building out an Mini Rack AI Super Cluster which I hope to get posted soon.

[-]

ANTIVNTIANTI@reddit

what mac, fam? i’ve got the m3 256GB, it’s sassy 😁 I go back and forth between regretting and not regretting getting the 512 though, it’s just, so much money to throw down, i’m hoping to get a job in the field though so… hopefully it pays for itself?! lol! Also the speed is nice, had to add buffers to my chat apps i built awhile ago, my darned gui just, couldn’t keep up (using PyQt6…. I.. don’t know why, i mean, i love it, but, prolly should’ve learned c++ and just go that Qt OG route lol?!?! anywho sorry i’m just rambling lol!!

[-]

onethousandmonkey@reddit

True. Very interesting!

[-]

Mythril_Zombie@reddit

It's not "custom built rigs" that they hate, it's "fastest tps on the planet or is worthless."
It helps explain why they're actually angry that this product exists and can't talk about it without complaining.

[-]

onethousandmonkey@reddit

I meant that custom built rigs are seen as superior, and only those escape criticism. But yeah, tps or die misses a chunk of the use cases.

[-]

Brilliant-Ice-4575@reddit

Can't you do similar on an even lower budget with 395?

[-]

RedParaglider@reddit

I have the same opinion about my strix halo 128gb , it's what I could afford and I'm running what I got. It's more than a lot of people and I'm grateful for that.

That's exactly what these devices are for, research.

[-]

noiserr@reddit

Love my Strix Halo as well. It's such a great and versatile little box.

[-]

RedParaglider@reddit

Yea.. a speed demon it isn't, but it is handy.

[-]

power97992@reddit

If it had 500GB/s of bandwidth, it would've been okay for inference.

[-]

Mikasa0xdev@reddit

DGX Spark's massive VRAM is a game changer for small research groups.

[-]

I1lII1l@reddit

Ok, but is it any better than the AMD Ryzen AI+ 395 with 128GB LPDDR5 RAM, which is for example in the Bosgame for under 2000€? Does anything justify the price tag of the DGX Spark?

[-]

Fit-Outside7976@reddit

The NVIDIA ecosystem is the selling point there. You can develop for grace blackwell systems.

[-]

noiserr@reddit

But this is completely different from a Grace Blackwell system. The CPU is not the same and the GPUs are much different.

[-]

SimplyRemainUnseen@reddit

Idk about you but I feel like comparing an ARM CPU and Blackwell GPU system to an ARM CPU and Blackwell GPU system isn't that crazy. Sure the memory access isn't identical, but the software stack is shared and networking is similar allowing for portability without major reworking of a codebase.

[-]

noiserr@reddit

It's a completely different memory architecture which is a big deal in optimizing these solutions. I really don't buy this argument that DGX helps you write software for datacenter GPUs.

[-]

seppe0815@reddit

by a spark and hop in in Nvidia clouds .... the only reason for this crap

[-]

SanDiegoDude@reddit

Yeah, I've got a DGX on my desk now and I love it. Won't win any speed awards, but I can set up CUDA jobs to just run in the background through datasets while I work on other things and come back to completed work. No worse than batching jobs on a cluster, but all nice and local, and really great to be able to train these larger models that wouldn't fit on my 4090.

[-]

960be6dde311@reddit

Agreed, the NVIDIA DGX Spark is an excellent piece of hardware. It wasn't designed to be an top-performing inference device. It was primarily designed to be used for developers who are building and training models. Just watched one of the NVIDIA developer Q&As on YouTube and they covered this topic about the DGX Spark design.

[-]

melikeytacos@reddit

Got a link to that video? I'd be interested to watch...

[-]

960be6dde311@reddit

Yes, I believe it is this one: https://www.youtube.com/watch?v=ry09P4P88r4

[-]

melikeytacos@reddit

Thank you!

[-]

GPTshop@reddit

100% a bot

[-]

Salt_Economy5659@reddit

just use a service like runpod and don’t waste the money on those depreciating tools

[-]

No_Gold_8001@reddit

Yeah. People have a hard time understanding that sometimes the product isnt bad. Sometimes it was simply not designed for you.

[-]

Freonr2@reddit

There's "hard time understanding" and "hyped by Nvidia/Jensen for bullshit reasons." These are not the same.

[-]

Mythril_Zombie@reddit

Falling for marketing hype around a product that hadn't been released is a funny reason to be angry with the product.

[-]

Freonr2@reddit

What changed in the sales pitch before and after actual release? Jensen gave pretty similar pitches at GTC in March and again at GTC DC month or two ago.

"Anger" is a projection.

[-]

noiserr@reddit

Spark lets us prototype and train foundation models, and (at last) compete with groups that have access to high performance GPUs like the H100s or H200s.

But a unified memory system is completely different from using dedicated GPUs. Not sure what the advantage is here if you already have access to the target hardware that's orders of magnitude better.

[-]

ab2377@reddit

i wish you wrote much more like what kinds of models you train, how many parameters, the size of your datasets, and how much time does this take to train in different configurations, and more

[-]

noiserr@reddit

I agree. It reads more like Guerilla advertising.

[-]

keyser1884@reddit

The main purpose of this device seems to have been missed. It allows local r&d running the same kind of architecture used in big ai data centres. There are a lot of advantages to that if you want to productize.

[-]

noiserr@reddit

It's not the same kind of architecture though..

[-]

whosbabo@reddit

I don't know why anyone would get the DGX Spark for local inference when you can get 2 Strix Halo for the price of one DGX Spark. And Strix Halo is actually a full featured PC.

[-]

Professional_Mix2418@reddit

Totally agreed. I've got one as well. Got it configured for two purposes, privacy aware inference and rag, and prototyping and training/tuning models for my field of work. It is absolutely perfect for that, and does it in silence, without excessive heat, the cuda cores give great compatibility.

And let's be clear even at inference it isn't bad, sure there are faster (louder, hotter, more energy consuming) ways no doubt. But it is still quicker than I can read ;)

Oh and then there are the CUDA compatibility in a silent, energy efficient package as well. Yup I use mine professionally and it is great.

[-]

Phaelon74@reddit

Like all things, it's use-case specific and your use case, is thr intended audience. People are lazy, they just want one ring to rule them all instead of doing hard work, and aligning use-cases.

[-]

Ill_Recipe7620@reddit

I have one. I like it. I think it's very cool.

But the software stack is ATROCIOUS. I can't believe they released it without a working vLLM already installed. The 'sm121' isn't recognized by most software and you have to force it to start. It's just so poorly supported.

[-]

SashaUsesReddit@reddit

Vllm main branch has supported this since launch and nvidia posts containers

[-]

Ill_Recipe7620@reddit

The software is technically on the internet. Have you tried it though?

[-]

SashaUsesReddit@reddit

Yes. I run it on my sparks, and maintain vllm for hundreds of thousands of GPUs

[-]

Ill_Recipe7620@reddit

Yeah I'm trying to use gpt-oss-120b to take advantage of the MXFP4 without a lot of success.

[-]

SashaUsesReddit@reddit

MXFP4 is different than the nvfp4 standards that nvidia is building for; but OSS120 generally works for me in the containers. If not, please post your debug and I can help you fix it.

[-]

Historical-Internal3@reddit

https://forums.developer.nvidia.com/t/run-vllm-in-spark/348862/116

TL:DR - MXFP4 not fully optimized on vLLM yet (works though).

[-]

the__storm@reddit

Yeah, first rule of standalone Nvidia hardware: don't buy standalone Nvidia hardware. (Unless you're a major corporation and have an extensive support contract.)

[-]

SashaUsesReddit@reddit

It isn't though.... people don't RTFM

[-]

amarao_san@reddit

massive amount of memory

With every week this is more and more wise decision. Until scarcity gone, it will be hell of investment.

[-]

whyyoudidit@reddit

how is it fine tuning a small lora of like 10gb compared to a 3090?

[-]

belgradGoat@reddit

With Mac Studio I get even more horsepower, no cuda, but I have actual real pc

[-]

GPTshop@reddit

Apple = P3D0s

[-]

dazzou5ouh@reddit

For a similar price, I went the crazy DIY route and built a 6x3090 rig. Mostly to play around with training small diffusion and flow matching models from scratch. But obviously, power costs will be painful.

[-]

john0201@reddit

That is what it is for.

[-]

GPTshop@reddit

imbeciles?

[-]

quan734@reddit

That's because you have not explored other options. Apple MLX would let you train foundation models with 4x the speed of the spark and you pay the same price (for a MacStudio M2), only drawback is you have to write MLX code (which is kind of the same to pytorch anyway)

[-]

Regular-Forever5876@reddit

it is not even comparable.. writing code for Mac is writing code for 10% desktop user and practically 0% of the servers in the world.

Unless for personal usage, it is totally useless and worthless the time spent doing it for research. It has no meaning.at all.

[-]

danishkirel@reddit

And then not be able to run the prototype code on the big cluster 🥲

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

Kugelblitz78@reddit

I like it cause of the low energy consumption - it runs 24/7

[-]

starkruzr@reddit

this is the reason we want to test clustering more than 2 of them for running > 128GB @ INT8 (for example) models. we know it's not gonna knock anyone's socks off. but it'll run faster than like 4tps you get from CPU with $BIGMEM.

[-]

Fit-Outside7976@reddit

Why INT8 out of curiosity? Wouldn't FP8 or NVFP4 be a better choice?

[-]

starkruzr@reddit

probably. just an example to make the VRAM math easy.

[-]

AdDizzy8160@reddit

So, you know, you will need a second one in the near future ;)

[-]

thebadslime@reddit

I just want one to make endless finetunes.

[-]

Fit-Outside7976@reddit

That's why I have two! The training never stops!

[-]

scottybowl@reddit

I love my DGX Spark - simple to setup, powerful enough for my needs

[-]

g_rich@reddit

The DGX Spark was literally designed for your use case; that’s not an unpopular opinion at all. It is designed for research and development, it was not designed as a replacement for someone with a Threadripper, 128 GB of RAM and 4x 5090’s.

[-]

drdailey@reddit

The memory bandwidth hobbled it. Sad.

[-]

Simusid@reddit

100% agree with OP. I have one, and I love it. Low power and I can run multiple large models. I know it's not super fast but it's fast enough for me. Also I was able to build a pipeline to fine tune qwen3-omni that was functional and then move it to our big server at work. It's likely I'll buy a second one for the first big open weight model that outgrows it.

[-]

_VirtualCosmos_@reddit

What are your research aiming for? if I might ask. I'm just curious since I would love to research too.

[-]

Slimxshadyx@reddit

What kind of research are you doing?

[-]

doradus_novae@reddit

I wanted to love it and had high hopes with the exo article. Everything I wanted to do with it was just too slow :/ the best use case i can find for it is async diffusion that i gotta wait on anyways like video and easy diffusion like images

[-]

imnotzuckerberg@reddit

Spark lets us prototype and train foundation models, and (at last) compete with groups that have access to high performance GPUs like the H100s or H200s.

I am curious to why not prototype with a 5060 for example? Why buy a device 10x the price?

[-]

siegevjorn@reddit

My guess is that their model is too big can't be loaded onto small vrams such as 16gb

[-]

Standard_Property237@reddit

I would not train foundation models on these devices, that would be an extremely limited use case for the Spark

[-]

bluhna26@reddit

How many concurrent users are able to run in vllm

[-]

imtourist@reddit

Curious as to why you didn't consider a Mac Studio? You can get at least equivalent memory and performance however I think the prompt processing performance might be a bit slower. Dependent on CUDA?

[-]

LA_rent_Aficionado@reddit

OP is talking about training and research. The most mature and SOTA training and development environments are CUDA-based. Mac doesn't provide this. Yes, it provides faster unified memory at the expense of CUDA. Spark is a sandbox to configure/prove out work flows in advance of deployment on Blackwell environments and clusters where you can leverage the latest in SOTA like NVFP4, etc. OP is using Spark as it is intended. If you want fast-ish unified memory for local inference, I'd recommend the Mac over the Spark for sure, but it loses in virtually every other category.

[-]

onethousandmonkey@reddit

Exactly. Am a Mac inference fanboï, but I am able to recognize what it can and can’t do as well for the same $ or Watt.

[-]

LA_rent_Aficionado@reddit

I’m sure and it’s not to say there likely isn’t already research on Mac. It’s a numbers game, there are simply more CUDA focused projects and advancements out there due to the prevalence of CUDA and all the money pouring into it.

[-]

onethousandmonkey@reddit

That makes sense. MLX won’t be able to compete on volume for sure.

[-]

inaem@reddit

I would rather use AMD units that go head to head with Spark in all specs concerned for half the price if it means I will release research that can be run by people

[-]

Freonr2@reddit

For educational settings like yours, yes, that's been my opinion that--this is a fairly specific and narrow use case to be a decent product.

But that is not really how it was sold or hyped and that's where the backlash comes from.

If Jensen got on stage and said "we made an affordable product for university labs," all of this would be a different story. Absolutely not what happened.

[-]

charliex2@reddit

i have two sparks linked together over qsfp, they are slow. but still useful for testing larger models or such.. i am hoping people will beginning to dump them for cheap, but i know its not gonna happen. very useful to have it self contained as well

going to see if i can get that mikrotik to link up a few more

[-]

Lesser-than@reddit

My fear of the Spark was always extended support.From the beginning of its inception it felt like a one off experimental product. I will admit to being somewhat wrong on that front as it seems they are still treating it like a serios product. Its still just too much sticker price for what it is right now though IMO.

[-]

gaminkake@reddit

I bought the 64GB Jetson Orin dev kit 2 years ago and it's been great for learning. Low power is awesome as well. I'm going to get my company to upgrade me to the Spark in a couple months, it's pretty much plug and play to fine tune models with and that will make my life SO much easier 😁 I require privacy and these units are great for that.

[-]

aimark42@reddit

My biggest issue with the Spark is the overcharging for storage and worse performance than the other Nvidia GB10 systems. Wendel from level1techs mentioned in a video recently that the MSI EdgeXpert is faster than the Spark due to better thermal design by about 10%. When the base Nvidia GB10 platform devices are a $3000 USD, and now 128GB Strix Halo machines are creeping up to 2500, the value proposition for the GB10 platform isn't so bad. They are not the same platform, but dang it CUDA just works with everything. I had a Strix Halo and returned it mostly due to Rocm and drivers not being there yet, for an Asus GX10. I'm happy with my choice.

[-]

Healthy-Nebula-3603@reddit

There is any popular opinion?

[-]

MontageKapalua6302@reddit

All the stupid negative posting about the DGX Spark is why I don't bother to come here much anymore. Fuck all fanboyism. A total waste of effort.

[-]

DataGOGO@reddit

That is exactly what it was designed for.

[-]

complains_constantly@reddit

This is an incredibly popular opinion here lmao

[-]

highdimensionaldata@reddit

You’ve just stated the exact use case for this device.

[-]

drwebb@reddit

And probably didn't pay for it personally

[-]

DerFreudster@reddit

The criticism was more about the broad-based hype more than the box itself. And the dissatisfaction of people who bought it expecting it to be something it's not based on that hype. You are using it exactly as designed and with the appropriate endgame in mind.

[-]

opi098514@reddit

Nah. That’s a popular opinion. Mainly because you are the exact use case it was made for.

[-]

Groovy_Alpaca@reddit

Honestly I think your situation is exactly the target audience for the DGX Spark. A small box that can unobtrusively sit on a desk with all the necessary components to run nearly state of the art models, albeit with slower inference speed than the server grade options.

[-]

jesus359_@reddit

Is there more info? What do you guys do? What kind of competition? What kid of data? What kind of models?

Bunch of test came out when it launched where it was clear its not for inference.

[-]

ArtisticHamster@reddit

It has its use cases of course, the main of which is getting a machine which is very similar to pro-grade serever machine for relatively small price.

However, I think, buying RTX A6000 might be a better choice than buying 2xDGX.