16x DGX Sparks - What should I run?
Posted by Kurcide@reddit | LocalLLaMA | View on Reddit | 524 comments
Let’s build the biggest ever DGX Spark Cluster at home. This is going into my home lab server rack, 2TB of unified memory.
• 16x Sparks
• 1x 200Gbps FS 24 x 200Gb QSFP56 Switch
• 16x QSFP56 DAC cables
Should be all setup by tomorrow afternoon, what should I run?
MrAlienOverLord@reddit
16 .. damn - i only have 8 - glad you putting in the r&d on bigger gb10 clusters - i was considering adding 8more but given i have only the crs804-4ddq i would need 4 switches to get that wired up 6 4 4 6(only2 used) if i interconnect the switches with 400g - that be additonal 3k for the switches and 3k for the cables ( ya the breakout cables are not that cheap lol)
please post benchmarks - also im sure thomas/azeez from atlas inference - particular for the sparks could get quite a bit more oompf out of those nifty devices
that beeing said i really hope someone cracks the firmeware for connectx-7 so we can use regular IB vs ethernet
Kurcide@reddit (OP)
8 is still a huge cluster of these. Would love to compare notes
MrAlienOverLord@reddit
thats my wonky tower of gold - pre racking em up
anactualalien@reddit
Nice paperweights.
RedShiftedTime@reddit
Seeing this has me realize I shouldn't be chasing hardware and should just be happy getting railed with whatever Subscription plan the large providers offer. I was debating spending 10k on the new Mac Studio + some sparks for prefill, but seeing all this hardware (over $70k worth) is only capable of running Kimi 2.6 it's like, ok sure privacy, but having to spend 120k in hardware just to get reasonable speeds for these models? I'll just...pay for sub or API access.....I suppose.
caim2f@reddit
Wouldn't ASICS with baked in models/weights be better ? Maybe in 30-40 years from now if we can get the manufacturing process down right we would have access to personal inference devices with lightning speeds. From what I've gathered they cost over 1,000,000$ to produce currently. I can't help but have this gut feeling that GPUs are not the right way to approach this.
RedShiftedTime@reddit
I don't think we will be using LLMs, in the current sense, in this time frame. Something else will be here. General computing isn't going anywhere.
MisticRain69@reddit
Only time stupid expensive local hardware for the SOTA models is the right call is if you make a bazillion dollars from something thats a trade secret that you can't use cloud models for since that would just give the cloud providers that trade secret
sn2006gy@reddit
it’s not a trade secret if a token can do it
Psychological-Lynx29@reddit
I think the sweet spot is spending like 7 to 12k on a GPU or multigpu system. That way you can run 27b+ models with max context.
Serprotease@reddit
No point on chasing the latest sota with consumers/prosumer level hardware. There is, I think a limit at around 400b model (256gb ram/vram) for useable local llm at achievable price (less than 10k) with usable performance.
Going above that and you are looking at either abysmal pp/tg, crazy expensive (power and cash) system, and/or kafkaesque setup.
TechTwentyTwo@reddit
You should run an analysis of your decision to purchase 16 Sparks
thecodeassassin@reddit
Right. See here's the problem.
I have a bunch of these as well and i really dont enjoy running kimi k2.6 or anything else large on it . Just too slow. I always fall back to my rtx 6000 pro cluster literally anything serious.
cartographr@reddit
Just to ask - what was the primary purpose of this purchase? Lab, mini dev/test data center or production use? This is definitely scale out vs scale up structure (imho) i.e. run more capacity / more throughput rather than run one thing faster or a very large model very slowly.
Dry_Yam_4597@reddit
Sell them and get some H100s.
Kurcide@reddit (OP)
I have a 4x H100 NVL system already in the rack
xamboozi@reddit
I have no idea what that many DGX sparks would do for you that 4x h100's wouldn't.
The DGX spark doesn't have a lot of memory bandwidth and the 200bgps links are even less throughput, so like.... Why?
Kurcide@reddit (OP)
Can’t run any SOTA open source models on 376gb Vram
thehpcdude@reddit
Would be cheaper and easier to just rent 8x H100's, especially when SOTA is going to be 1T+ params in the near future. Hopefully you didn't actually buy a bunch of sparks.
siete82@reddit
Also pay for the claude subscription, but that's the point of this sub
joefourier@reddit
A Claude subscription that goes offline for hours at least once a month, gets nerfed with the company denying it for weeks when they can't get enough GPUs, and wastes millions of tokens on thinking traces that you can't actually inspect?
illicITparameters@reddit
I can't wait for Anthropic, Google, Microsoft, and OpenAI to start charging by the token.
thehpcdude@reddit
To me the point is more what can I do with reasonable hardware or what hardware a common enthusiast can wield. I think the other half of the point is showing that smaller parameter models can do day-to-day actions with ease.
Buying a bunch of off the shelf hardware to run a SOTA model at home is a waste of not only money but time. Not sure why people think it's some sort of flex, but I may be biased because of my work.
Ok-Internal9317@reddit
True
bigh-aus@reddit
yeah not worth geting the H100s unless you already have them - H200NVL is better - 4x 141gb but the price vs 16 dgx sparks - $120k+ vs \~$64k...
Problem is you really need 8x H200s and a machine to use them - getting closer to b200 territory.
Relative_Rope4234@reddit
bro must be a millionaire
sourceholder@reddit
Not anymore.
Successful_Flow1329@reddit
Well, not if he was billionaire before.
VegetableDelay1658@reddit
Yeah this dude has watches that are more expensive than my life
Reasonable_Ad5611@reddit
not anymore
VirtualPercentage737@reddit
He just paid for Jensen's kid's college.
florinandrei@reddit
Or for the 17th alligator leather jacket.
SkyFeistyLlama8@reddit
Plot twist: OP is an Nvidia billionaire or hundred-millionaire, one of the early joiners with a ton of stock options.
Thicc_Pug@reddit
right, he used to be a billionaire
Thalesian@reddit
Just checked the post history and yup. At least.
xb4r7x@reddit
Like most people in the tech industry...
Deep90@reddit
Does it count if you have a million in debt?
Noiselexer@reddit
uhuh, why even bother with sparks then?
Dry_Yam_4597@reddit
Damn, that's nice.
quadiuss@reddit
Selling 16 of them just to get three H100s
woobchub@reddit
Away
SuperLucas2000@reddit
Chrome with 3 tabs i date you
Kurcide@reddit (OP)
I only have enough confidence for 2 tabs
jimmytoan@reddit
With 2TB unified memory across 16 nodes, the big unlock is running 671B+ parameter models at full precision with long context windows. The sm121 missing kernel issue is real though - older LLMs won't run without workarounds. Best bet right now is Kimi K2.6 with vLLM using eugr's nightly builds while the DeepSeek V4 PR gets merged. Prefill throughput will be exceptional but token generation will cap around 20 t/s regardless of node count - if generation throughput matters, hybrid with Mac Studios for the decode step.
Kurcide@reddit (OP)
that’s exactly what I want to do once M5 Ultras come out. Add some Macs to the rack
yammering@reddit
16 is um, a lot. Kimi K2.6 runs very well on my eight node cluster with vLLM using eugr’s nightly builds. There are unmerged PRs for Deepseek V4 for vLLM. Flash runs fine on 8x, Pro could fit on your 16. You will get monster refill numbers but no matter what you do token generation with average 20 t/s.
Kurcide@reddit (OP)
I’m hoping to eventually add Mac Studio M5 Ultras to this for token gen and have the Sparks be prefill
yammering@reddit
Do you know what software stack for that? The sparks are quirky in that even older LLMs like DeepSeek 3.2 don’t run due to missing sm121 kernels for some types of attention. It’d be awesome to frankstein that but i’m skeptical.
Xlxlredditor@reddit
I believe eXo supports prompt processing on the spark them running them prompt on M5 Ultras
-dysangel-@reddit
Whoah. I might have to try this with my M3 Ultra..
Xlxlredditor@reddit
Not yet apparently. I thought they already did but no
-dysangel-@reddit
They do: https://blog.exolabs.net/nvidia-dgx-spark/
TechTwentyTwo@reddit
They demonstrated it in October of last year and wrote that blog post stating that 1.0 would include capability for disaggregated P/D across Nvidia and Apple silicon, but when they released 1.0 in January 2026, it wasn't and isn't yet included. There has been progress on it as recently as the last couple days (PRs #1993, 2000), so it probably won't be too much longer before this is ready to ship. Keep an eye an PR#1776 on the Exo github
TechTwentyTwo@reddit
Not yet
Xlxlredditor@reddit
Crap. They wanted to, iirc?
Badger-Purple@reddit
nope
MrAlienOverLord@reddit
i think you are actually better off running raw vllm on the sparks then adding the macs to it - exo way with heterogen. networks have a massive latency to transfer the state and to my understanding its mostly llama.cpp that runs on those .. -> way way way too slow to be usefull - there benchmarks dont tell the full story as they run llama.cpp on the sparks which noone in its right mind would do
TechTwentyTwo@reddit
I am trying to set this up at this very moment. I have 4 Mac Studio M3 Ultra 256 GB coming. The first two will be here tomorrow and the other two in a week. I already have two DGX Sparks
averagepoetry@reddit
Please update if this works! I have m3 ultras as well and would love to pair them with the dgx spark.
Fit_Concept5220@reddit
For anyone interested, the estimated prefil for dense Gemma/Qwen ~would be around 130k t/s. That said, 100k prompt will be processed literally in a second. The estimated token generation on as of now hypothetical m5 ultra would be around 70/80 t/s on q4 quants.
I must admit to myself that I was deeply wrong about dgx spark and this is a monster machine for prefil cluster, and also the setup with dgx plus studio is genius example of out of the box thinking. Thanks for sharing OP.
Kurcide@reddit (OP)
It’s absolutely possible to have a 16x cluster
vVolv@reddit
I'd love to learn more about how you're clustering them - I haven't looked too deeply into it, but I recall prior to launch it mentioned you could link two of them, and presumably it would be a limitation in the dgx OS. To be clear, I'm not saying it can't be done, I just would like to know how.
Badger-Purple@reddit
The switch he got will allow for that kind of cluster
vVolv@reddit
Yeah, I got that but I thought (obviously incorrectly) that they'd baked the limit into the software
Cane_P@reddit
They have never limited it, they just don't support it officially. Any problems is up to yourself to fix, they won't do it.
Sea-Replacement7541@reddit
Dumb question. But by prefill you mean the time to process the prompt?
So there people count time to load prompts and then time for token generation which means the actual output?
illforgetsoonenough@reddit
Prefill = prompt processing Decode = token generation
More-Curious816@reddit
Yes. Both are important, if one is slow, your output is slow. Like spark has monster prefill but crappy tg, while macbooks has crappy prefill but decent tg.
worldburger@reddit
How will you do that with Mac Studios?
Does EXO do disagg prefill-decode?
Capable_Site_2891@reddit
exolabs.net
worldburger@reddit
Does EXO now do disagg prefill decode?
MajorZesty@reddit
Their repo makes it sound like Linux support is currently CPU only and I can't find anyone talking about using disagg this way, only wanting to. Feels like there'd be a lot more info on this, but I'm still gonna dig some more.
NoFaithlessness9789@reddit
What about https://github.com/Scottcjn/exo-cuda ?
Badger-Purple@reddit
no one has replicated their “experiment” and I’m pretty sure it was more marketing than reality
Capable_Site_2891@reddit
There is less of a reason to do so now, with the m3 Mac vs the spark was 11:1, m5 is 3:1. If m5 ultras came in the 512gb configuration at a decent price point, the spark would be almost redundant for this.
ItzDaReaper@reddit
Which chat room? Can I join?
ifheartsweregold@reddit
2x Spark Owner here….all I can say is good fuckin luck with that.
ComfortablePlenty513@reddit
nvidia and mac are two entirely different stacks, so idk how you'll manage.
cwr252@reddit
Honest question: why not use API at this point? Is it because of privacy?
ServiceOver4447@reddit
why get married when we can fuck for a hundred bucks
AlienRedditMaster@reddit
Same answer ? to have kids ?
ServiceOver4447@reddit
you don't need to be married to have kids
FatheredPuma81@reddit
Because someday you might want to have sex again.
ServiceOver4447@reddit
best blowjobs are from some prostitutes, why? because they are experienced.
FatheredPuma81@reddit
Cause you want sex.
Gravefall@reddit
because condoms
pm_me_tits@reddit
Except in this analogy we're rawdogging the api (aka they can read your input)
cwr252@reddit
Fair point haha
SKirby00@reddit
I'm actually kind of curious about this myself, so I did the math. Here's a breakdown of why it could make sense for someone to do this. It makes a bunch of completely baseless assumptions that probably don't all hold true for OP.
He probably spent ~$75K USD on this before tax (
$4,700 MSRP × 16 = $75,200). Given the size of the investment, I'm just gonna go ahead and assume that someone making this kind of purchase has a business and will be able to write this off as a business expense (or more likely, write off its depreciation over the next few years). Assuming they expense any depreciation and then recuperate the residual value in a few years (let's assume for ~$3000 USD in 3 years), these could easily have a true/effective cost closer to$4,700 - $3,000 = $1,700,$1,700 × (1 - 0.30) = $1,190per unit (this baselessly assumes that it would be offsetting income that would otherwise taxed at 30%) or closer to$1,190 × 16 = $19,040total. So in this hypothetical the cluster would have a ~$19K effective/net cost over 3 years (or ~$6.35K per year).Now let's see how much API usage it takes to hit ~$6.35K per year. For Kimi K2.6, it's $0.95/1M input and $4/1M output (edit: I made a mistake here, see my note at the end). Baselessly ssuming a ~3:1 input to output token ratio (this varies a lot by use case), that's about $6.85/4M tokens total, or about $1.71/1M on average (note however that there seem to be K2.5 providers that offer ~half this cost). At that price, they'd need to process ~3.7B tokens (at that same 3:1 ratio) per year to reach the same cost. If this cluster is running 365 days/year, that's ~10.15M tokens per day, or 423K tk/hr, or 7,050 tk/min, or 117 tk/sec. Considering this is for combined input and output, that feels very feasible to surpass with such a big node, but it also hinges on a 24/7/365 usage assumption which is likely unrealistic. There's one big caveat though... I didn't factor in electricity at all, and frankly I don't feel like it.
Anyway, with enough usage, the right tax/cost recuperation factors in place, and relatively affordable electricity, it's very possible for this to be comparable to cloud models in term of economics, at least for a business.
There are also other factors though. Off the top of my head, I can think of: - Privacy re: valuable business information - Privacy re: client or employee information (incl. possible contractual obligations/restrictions & legal requirements) - Cost stability/predictability - Different accounting treatment for investments vs operating expenses (varies greatly depending on where he's located) - Response latency - Independence / self-reliance - Stability / predictability (quality won't suddenly change out of the blue, and they won't be forced off of one soon-to-be-discontinued model at an inconvenient time to optimize all their work around some new model) - Better looking balance sheet with these assets on hand could feel more comfortable for investors or debtors - More end-to-end control could mean better optimizations around caching, which could help reduce costs
Conclusion: the margins are pretty tight, but with enough utilization/uptime, this could achieve significant non-monetary benefits at a reasonably low relative cost increase, or potentially even a cost reduction compared to using an API. But this requires HEAVY utilization and reasonable electrical costs.
Wait a minute... I forgot to adjust the API cost for the ability to write it off as business expenses at a similar rate as the depreciation. I don't feel like adjusting the math on that, but it definitely does make it harder to achieve a similar cost. Not impossible though.
ItzDarc@reddit
Don’t forget though that the cloud models are losing hundreds of millions every day and will likely be unable to sustain that for many more years. The price will likely go up especially for business application. They’re just currently in the “addict the culture” phase of this particular drug. I believe long term this will end up being the less expensive way to do it by far.
ThunderGeuse@reddit
He can actually write off 100% of the deprication in the first year thanks to OBBBA extending section 179 expense deductions.
han4wluc@reddit
what about electricity costs?
Ok_Warning2146@reddit
Why not just buy 8xRTX 6000? That should be faster for both prefill and inference.
SKirby00@reddit
I don't feel like doing the math for that lol.
It's much less memory though and might not be able to fit the very biggest models that he wants to run.
Ok_Warning2146@reddit
It is 768gb. Good enough for a quant of kimi 2.6. You can also use it for computationally intensive video gen
Cane_P@reddit
Not as much memory? If you are already in this economic ballpark, then you could buy a DGX Station instead. It will definitely have more tokens per second than Spark's. But I would probably wait for the next version, since the memory (that isn't HBM) have a lot higher bandwidth on it, compared to the Blackwell version.
ormandj@reddit
Any idea when that might be coming?
werther41@reddit
We currently building Parabricks server, clinical setting needs full data control, if you post patient data into any LLM through API, you have no idea where does it ended up with. The setup we have cost around 50k-70k, 2x RTX Pro 6000 96 GB vram. This cluster setup has a lot more unified RAM
AnonsAnonAnonagain@reddit
But for $85k it’s confirmed he could have gotten an MSI DGX Station GB300 Which would outperform 16x DGX Sparks, especially since the sparks do not have Commercial Blackwell (the sparks are missing TCGEN05)
(What is TCGEN05
ClickClawAI@reddit
First off, great work on doing the maths.
But you also left out another reason to do local over api… it’s way more cool!
_BigBackClock@reddit
why do we buy cars instead of leasing?
Ok_Warning2146@reddit
Well, u can get better car for the same money in the form of 8xRTX 6000.
nochkin@reddit
More like why we own a car instead of taxi.
muyuu@reddit
if you already have the hardware, why not?
cwr252@reddit
I can see that… just seems a bit expensive to buy it in the first place, doesn’t it?
muyuu@reddit
well, i'd say so, but there are definite advantages
you can run other configurations different than the ones offered by API, you can make it deterministic for instance which is useful for testing, you can rely on it being available in the future for specific workflows, etc etc
this is /r/localllama after all, you'd think people appreciate the possibilities
yammering@reddit
Where’s the fun in that? Also this is r/localllama not cloud :)
Roll_Future@reddit
I thought kimi k2.6 needs a monster with a shit load of ram and at least 2xh200. Am I missing something?
yammering@reddit
8 sparks is slightly less than 1TB of VRAM. That's enough for the 660GB of model weights and lots of KV cache. The downside is that only 20 t/s generation.
TheAncientOnce@reddit
what kind of speed are you getting?
somatt@reddit
Can you give any advice for learning to run LLMs sharded across clusters
yammering@reddit
There are a lot of options, and unfortunately the docs online are often out of date. I prefer vLLM at the moment but ignore everything in their docs about Ray, it is terribly unreliable (at least on my sparks) and native clustering works better.
somatt@reddit
Cool if you do please pm me a link I would love to see it. I was looking at petal?
running101@reddit
How can you run k2.6 across multiple machines? what mechanisms do you use?
siete82@reddit
vllm
Porespellar@reddit
Sparkrun.dev
TokenRingAI@reddit
Is that token generation number with or without speculative decoding?
yammering@reddit
Without.
bick_nyers@reddit
What's the prefill speed for Kimi? Are you using NVFP4?
yammering@reddit
kimi is natively int4 so i just kept it at that for accuracy. about 1500-1600 pp t/s at max context size.
Pupsi42069@reddit
Factorio
severemand@reddit
Reddit, is this a new trend that this generation is doing instead of super or muscle cars?
People buying stockpiles of compute and then goint to reddit to flex and ask what they should run on them?
Run what you have bought them to run probably?
ChocomelP@reddit
Imagine you could buy a Bugatti and then actually drive it everywhere at max speed all the time.
Direct_Turn_1484@reddit
Dude. How are you linking them? Daisy chain them all together or do you have a 16 port 200Gbps switch?
Kurcide@reddit (OP)
I bought one of these:
https://www.fs.com/products/352159.html?now_cid=4319
Deep90@reddit
The city is going to think you're growing weed with all the heat and power usage lmao.
ChocomelP@reddit
I just found the perfect cover. I should start a weed farm to hide the fact that I'm running GPUs.
SharpSharkShrek@reddit
If you don't mind me asking; why do you "need" all these hardware? Wouldn't it be much more cost effective to use online services if you're not selling AI solutions somehow and just using them?
Direct_Turn_1484@reddit
Nice. Wish I could have my own small scale data center.
Status-Secret-4292@reddit
I have to ask.
How much did this run you?
What do you actually do with LLMs?
What do you do for a living?
DownSyndromeLogic@reddit
I'm pretty sure you already have an idea what you're gonna run. I mean, why else would you spend. Fifty or 100 thousand dollars on all this equipment. You didn't just do it, just to post a post on Reddit and ask us what to do. Tell us what you're actually going to run.
Endless7777@reddit
Why did you buy them??
Playful-Cat-4226@reddit
u should run for president.
CubicalMoon@reddit
How do you end up with $75000 worth of tech and no idea what you actually want to achieve with it?
nickN42@reddit
Mate, are you a kid or something? Guy clearly does this professionally, he's here just to flex on us, poors. I would absolutely do the same in his situation.
electrosaurus@reddit
He's not the one that sounds like a kid.
Low-Boysenberry1173@reddit
Professionally? What the hack can you do with these pieces in a professional environment? This is far fron any professional context. It is just a bingo bullshit setup for fun.
electrosaurus@reddit
These are worse than AI bot slop posts and should be banned from the sub, really.
ThisWillPass@reddit
People spend the same on cars and rarely even drive them, which has been normalized for a long long time unfortunatly.
SleepAffectionate268@reddit
but that car may loose what at most 50% value in like few years the dgx sparks will be worthless in a few years, because we will have way higher ram and compute as with all tech, but with cars it depends
fitechs@reddit
You don’t have a car to drive it all the time? But to drive it when you need to
Successful-Total3661@reddit
Approximately how much power will it draw to run this cluster?
NetZeroSun@reddit
I know this is some serious flexing but I have to ask. What is this all for honestly and how did you pay it / what’s your job?
VegetableDelay1658@reddit
Check his posts, bro drives an aston and a lotus and wears AP and rolex
bobdvb@reddit
He was also into crypto.
Also collectables.
Also stocks.
I can't decide if he got fortunate along the way or just follows the wind with someone's money.
uhuge@reddit
*think The more you buy..
ICanSeeYou7867@reddit
Honestly....
I would set them up as kubernetes worker nodes with the nvidia gpu operator and the Kai scheduler... if the gpu operator node supports the GB10.
However you wouldn't be able to "combine" them easily. But it would be interesting!
norskyX@reddit
Adobe Flash /s
MotokoAGI@reddit
Ken, please stack the DGX Sparks on the shelves. The store is opening in 15 minutes.
PrestigiousDrag7674@reddit
I gotta show this on Reddit
beryugyo619@reddit
make no mistakes and make sure to include free tungsten cubes
Firewormworks@reddit
Hahaha
drox63@reddit
Let me get this pic out for the gram first Phill.
Raredisarray@reddit
Lmfao
Hearcharted@reddit
🤣
PrestigiousDrag7674@reddit
Let’s see your racks
Turbulent-Walk-8973@reddit
I have a single DGX spark, and I never managed to get above 45t/s with qwen3.6-35b-a3b at Q8. An I doing it right? I see so many people with 80+ on RTX GPUs for qwen3.6-27b, so I feel smtg is wrong somewhere. Or dgx spark is the wrong thing to buy
GabryIta@reddit
Kimi 2.6/GLM5.1!
Powerful_Evening5495@reddit
Send a brother one, and I will pray for you.
"May God increase his tokens to infinity."
sxt87@reddit
But why?
Revolutionary_Rub530@reddit
Gemma 4
Hambeggar@reddit
Aren't these kinda shit since they don't have TMem.
amp804@reddit
minecraft let the different models form clans lol
aomogol@reddit
Tetris 😄
FederalSun@reddit
Give me one lol
noo8-@reddit
Necessary_Pride1093@reddit
doom of course
Ok_Try_877@reddit
SnooDogs7747@reddit
Lowest settings
AcreMakeover@reddit
Might be able to handle medium if you're ok with 30 FPS.
Either_Audience_1937@reddit
At 480p
Intelligent-Staff654@reddit
To my place with 1 or 2 of them to drop off
_ytrohs@reddit
To the bank
spliffsandshit@reddit
Unfortunately this is going to be painful slow and inefficient. Processing speed will be great though
Fearless_Weather_206@reddit
Did you buy them at discount?
admiral_corgi@reddit
Probably going to need to upgrade your electrical lol, this looks like an insane amount of power draw
Kurcide@reddit (OP)
Already have a newly ran sub panel in the house with 240 circuits
optomas@reddit
All that, and no 3p 480V?
VestedLoves@reddit
The crypto/nft loser to AI loser pipeline is real.
CrypticZombies@reddit
check yo electric bill
mr_zerolith@reddit
Return them and get 4 RTX PRO 6000's.
384gb of vram is pretty decent, and you'll have about the same performance as 8 of those.
JustTesting314@reddit
Send me one I'm struggling with my 24 vram. Invest in me business 😁. That being said. Try deepseek pro
Master_Zack@reddit
sir are you a billionaire
jinnyjuice@reddit
What are you going to run them for?
Your choices are probably going to be between MiMo V2.5 Pro, DeepSeek V4 Pro, GLM 5.1, MiniMax M2.7, depending on the answer. DGX Spark's bandwidth is not that high, so go with a 4 bit quant AutoRound, vLLM if multiple users, SGLang if single user or two maybe three depending on usage intensity of each user.
Kurcide@reddit (OP)
This is all actually good advice. Appreciate it.
I was going to to run Deepseek, i’m trying out SGLang on 8 of the nodes now but looks like there’s still some issues with SM121
BIOffense@reddit
Honestly the only actual advice you've gotten in this thread, but really, extremely rare to find someone with a hardware, let alone 16 of them. Generally, you should hire.
One SGLang on 8 unified nodes, right?
For which model? What errors do you see?
Turbulent_War4067@reddit
I have seen quite a few folks advice him to run Doom. It seems to be the most popular answer :)
Kutoru@reddit
There are no "issues" with SM121. It will work, you just don't get optimized inference paths if running an outdated version or not supported.
Anything used by localllama can likely be fangled to work if you dive into the weeds, most of the time just recompiling the CUDA arch or doing tiny changes like adjusting the SM cache size (which are smaller).
ocassionallyaduck@reddit
Run from the banks.
They're going to repo your house.
Dry_Shower287@reddit
I think Even though 20 Sparks and one DGX Station are the same price, the Station offers much better value because of its insane speed.
No-Comfortable-2284@reddit
can 270gb/s bandwidth rly run anything at meaningful speeds
spaceman3000@reddit
I have strix halo, simmilar bandwidth. Dense are slow but MoE are fast even at 120B
No-Comfortable-2284@reddit
yea gpt oss 120b only processes 5b during inference so it is fast but what do u do with 16 dgx spark.
spaceman3000@reddit
No. Maybe guy works for Nvidia and got them free or this is marketing (he posted on several subs) or just his company bought them for developers and he will just do a temporary cluster for fun.
No-Comfortable-2284@reddit
gotcha thanks for that. yea pretty much my thought. didnt think u could run anything at meaningful speeds that requires 16 dgx spark. rly enjoying my spark though. its so quiet and nice to just have on 24/7 fine tuning and just hosting my website.
Vancecookcobain@reddit
Goddaaaammnn....how much wattage is that?
On the flip side you probably can run DeepSeek v4 pro right? Well whenever it comes out if the weights haven't been released yet
Alternative_You3585@reddit
Bro 💀
Just run Kimi and be happy, tho I assume the speeds are gonna be slightly painful
sn2006gy@reddit
i pay 80 bucks for unlimited kimi basically.. that’s less than the electricity for those machines to be on
Kurcide@reddit (OP)
The entire system is 200Gbps node to node. Eventually I want to see if I can use these for prefill and cluster Mac Studios in for token gen after the new ones come out eventually
ceinewydd@reddit
NVIDIA wired this with PCIE 5.0x4 to the SoC so it’s 200G in terms of what links up to the switch but practically speaking hits 109Gbps and runs out of gas. Patrick from STH covered this in a video about clustering eight units together recently.
Kurcide@reddit (OP)
I confirmed on my current 8x Spark cluster. Single 200G cable per node, FS N8510 switch running RoCEv2 with PFC/ECN, MTU 9000.
The PCIe 5.0 x4 ceiling is real but NVIDIA did something weird with the wiring. Each physical QSFP port is fed by two separate PCIe x4 links that show up as twin logical RDMA devices in the OS (rocep1s0f1 and roceP2p1s0f1). So that ~111 Gbps cap is per x4 link, not per cable.
Saturate both x4 links across the single cable (NCCL_IB_HCA pointing at both twins) and you get ~199 Gbps through one physical port. NVIDIA basically split one 200G port across two PCIe x4 paths because they couldn't give it x8 lanes.
Per-flow workloads still cap at ~111 Gbps. Per-node aggregate gets to 92.5% of theoretical 200G if you use both twins. NCCL handles it transparently with NCCL_IB_HCA=rocep1s0f1,roceP2p1s0f1.
So the 200G is real, you just have to know how to actually extract it.
thehpcdude@reddit
Why not actual IB? RoCE is meh and introduces latency that you don't want. IB is dead simple.
ragingpanda@reddit
I think the connectX firmware for the dgx spark is hard locked to ethernet. Not sure if anyone has gotten around that (yet)
CommunicationOld9889@reddit
Clusters are so slow.
burger4d@reddit
Please post some performance numbers after you get everything setup, I’m very curious
ATK_DEC_SUS_REL@reddit
You’re gonna go far kid.
comp21@reddit
I thought the largest cluster you could make is eight? How is 16 going to work?
CreamPitiful4295@reddit
My amp goes to 11
DukeOfPringles@reddit
One problem, if you’re in America at least, you wall circuit will blow if about 12 of them run at a load of 120watts, so either you have two independent circuits near by each other (with nothing else going plugged in) and a REALLY long network cable to attach the routers. Or you own the home and got an electrician to do some rewiring. I can think of a lot better ways to spent 64k.
Kurcide@reddit (OP)
I own a home and had an electrician add a dedicated panel and 2 240 industrial outlets for the rack
DukeOfPringles@reddit
My level of jealousy is so dam high, I have to limit my setup to not trip he breaker.
codingafterthirty@reddit
I want to be DGX Sparks rich. And that is awesome. Would be interesting to compare large DGX cluster vs Mac Studio cluster. Lol, me, I am just rocking AGX Orin 64gb. Slow as hell, but get's the job done.
My_unknown@reddit
Try creating AI slop and post the videos to social media to buy more of them
My_unknown@reddit
Run a courier website and send them to me 😁😁
Wubbywub@reddit
a charity
Fancy-Restaurant-885@reddit
Jesus fucking Christ, just - how do people have so much money just burning a hole in their pocket?
MisticRain69@reddit
And here I thought the dudes making the dual rtx pro 6000 rigs were rich damn this dude makes the guys with those rigs look like us poors
Badger-Purple@reddit
Motherfucker also bought all the reaper sauce.
Kurcide@reddit (OP)
lmao that was 6 years ago man… I just wanted that delicious reaper sauce. It was the closest thing we had in like a decade to the volcano sauce
Ok_Warning2146@reddit
Would this setup be faster than a 1.5TB RAM + one RTX 6000 setup?
Kurcide@reddit (OP)
the 1.5tb of ram won’t help in that example. only the 6000 pro
the 6000 is faster than a spark but the sparks just have so much more unified memory
Yosanga@reddit
Donate
Ikkepop@reddit
a fucking hedge fund dude, you seem to be loaded
Full-Sense5308@reddit
This is no longer local llama 😂
thawizard@reddit
OP going full localDataCenter.
johnnyhonda@reddit
Why would you buy 16x DGX Sparks, and then go to reddit to ask people what to run on them?
Kurcide@reddit (OP)
for Karma?
thawizard@reddit
Can I have one when you’re done playing with them? 😅
Mythril_Zombie@reddit
You didn't buy them, how did you get them?
Kurcide@reddit (OP)
I definitely bought them
cddelgado@reddit
Can I have one? Rather, can my university have one?
Live-Possession-6726@reddit
Atlas - atlasinference.io
Porespellar@reddit
Why did you not opt for a GB300 DGX Station? They are out now from several vendors and I think are running about $90K
PrysmX@reddit
That's still "only" 768GB lmao.
Kutoru@reddit
Just for some clarity. The support for mixed bandwidth workloads is extremely poor (outside of .cpu()) and rightfully so as it is not worth the complication to support.
It is better to treat it as a 252GB HBM2E GPU and a 496 LPDDR5X GPU. Then there is also time sharing complications and have to be very careful to make sure the LPDDR5X data doesn't go through HBM2E before hitting the GPU - as you'd want a similar experience to DGX Spark.
PrysmX@reddit
Yep, I just didn't want to overcomplicate my response since 768 < 2048 anyway. 😄
MajorZesty@reddit
Hm, only numbers I've seen is closer to $150k, but they're all custom talk to sales stuff. Haven't seen anyone post actual quotes.
pheoxs@reddit
https://configurator.exxactcorp.com/configure/VWS-158270643
95k for 496gb of ram and 252gb of hbm3e
MajorZesty@reddit
Nice! Thanks for the link. It's cheaper than I thought it'd be. Not that it's in my budget lol
pheoxs@reddit
Have you considered cutting avocado toast out of your budget /s
Sad-Enthusiastic@reddit
Roblox
thefox828@reddit
Did you get a better price ordering so many?
Kurcide@reddit (OP)
yes, got them slightly below original retail. So saved like $550+ on every node
Blackdragon1400@reddit
I’m still mad losing $700 buying my 2 sparks a week apart after the price hike
DaMan123456@reddit
Whatever the hell you want! lmao :D
Allseeing_Argos@reddit
What should you run? You should run from me.
Smultar@reddit
I'd kill to get one of those, but cant afford em
Kinky_No_Bit@reddit
16..... 16.... @ how much a piece? $4,699.00 .... sooooo..... $$$ 75,184 dollars.... O.o
somerussianbear@reddit
Run back to the shop to return this crap.
epSos-DE@reddit
Gemma 4 IS GOOD !
Kimmi is good !
The online version of Kimi is better than Claude , because it reasons better, BUT fanboys going to hate if you say it !
prince_pringle@reddit
Serve chessagents matches! I just finished a rust/gpu chess engine for the spark
emteedub@reddit
Chain 14 together, then send 2 my way
miltonthecat@reddit
First go watch this video from ServeTheHome which is the closest thing you’ll get to an instruction manual for a cluster of this size.
https://youtu.be/uYepcMoqvKQ?si=73k7DjTk-HqgPEON
SanDiegoDude@reddit
Dude I love my DGX, I develop on it constantly and it's rad... but it's ungodly slow. I could only imagine what trying to run a massive model that the 2TB would support when I get impatient just waiting on Qwen 27B to hurry tf up, lol. I'm jealous, but also please please please share what your actual t/s times are once you can run one of those open source monsters that are dropping out of China.
SpearHook@reddit
My hat’s off to you. I have one hooked up, working on my second. Do you need dedicated/special power hookup for that many rigs?
ArthurParkerhouse@reddit
lol, is this from spare pocket change, or a 2nd mortgage?
Ok_Campaign6438@reddit
Doom in 4d
kyr0x0@reddit
AFAIK you can only pair 2?
Kurcide@reddit (OP)
with a switch you can pair as many as you want
Embarrassed-Rip-3205@reddit
Bro, reading your posts... did you get rich from dogecoin?
Kurcide@reddit (OP)
lmao did you go back 8 years in my posts?
markstar99@reddit
At this point you can train AGI on your own
deepsky88@reddit
my mom
LavenderDay3544@reddit
Doom
jaysin144@reddit
Extension cords and air conditioning.....
LifeguardPuzzled3212@reddit
yourself outside to touch some grass
InfiniteClick@reddit
Shouldn’t you have been asking that question… before ?
firest3rm6@reddit
Minecraft Server
TheyCallMeDozer@reddit
You should run one over to me in the post lol ... isn't that like $80,000+
utf16@reddit
Will it run Crysis at full 4k?
Kurcide@reddit (OP)
I need 16 more sparks for that
thebloodreaper6739@reddit
im curious, what does your work look like to make use of hardware at this scale ?
philmarcracken@reddit
The most rich phrasing ever. None of the rich ever do anything manual themselves lol
Kurcide@reddit (OP)
I’m literally crawling behind the rack and doing it myself. No fun if someone else did it
oftenyes@reddit
I thought you could only connect two sparks formally and three informally. Is that not true anymore?
Kurcide@reddit (OP)
nope, with a switch you can connect as many as you want
Foreign_Aid@reddit
Inversion: The Shortest Path to Disaster The most direct way to burn $100,000 with zero usable results is assuming this setup will function like an enterprise data center. Here is exactly why trying to run a massive 1T model across this cluster for real-time chat will systematically fail: The Communication Bottleneck: Professional nodes use NVLink, offering speeds around 900 GB/s. This cluster communicates over copper ethernet cables at 200 Gbps (yielding roughly 25 GB/s of actual throughput). If you shard the weights of a massive model across 16 nodes, transferring activations over the network will take significantly longer than the compute itself. The system will technically work, but latency will render it practically useless. Compounding Second-Order Costs: 16 compute nodes running 24/7 plus a high-throughput switch will generate a continuous multi-kilowatt power draw. This will rapidly max out your residential electrical infrastructure and mandate an immediate, expensive, and loud dedicated cooling setup, completely defeating the purpose of a "home" lab.
Powerful_Ad8150@reddit
Nah, maybe he simply lives in Europe? 16x150W=2400W, lots of spare wattage for other stuff. And we still talk about single socket, single 16amp fuse while typical new connections where I live are like 20kW
Serprotease@reddit
You’re replying to an AI comment with messed up markdown formatting. And it’s quite wrong/out of date. NVlink is not really a thing anymore and 200gb/s is enough for 1T model with 8 nodes clusters per other users experiences.
Mother-Agent7445@reddit
So why 16 sparks? Does this make the gpu bandwidth equivalent to like perfect inference response for lots of people?
Due-Opportunity6212@reddit
Question, I am a beginner, how the hell would this be clustered? Also, is the latency terrible?
And lastly, I wanna do that too, is it good?
Due-Opportunity6212@reddit
Will there be a video too? I wanna see it badly if it works.
Techngro@reddit
"You're a rich girl, and it's gone too far cause you know it don't matter anyway..."
onewheeldoin200@reddit
Jesus dude 😂
FusionCow@reddit
This is kinda ridiculous, I mean honestly the only models TO run are kimi k2.6 and deepseek v4 pro
patricious@reddit
You just called us poor in 16 ways.
TheWhiteKnight@reddit
if you want to feel poor go here -> https://www.reddit.com/r/Salary
Firewormworks@reddit
Wow, that did make me feel poor... Should have been a dentist.
More-Curious816@reddit
Most of the posts there are fake, like 250k? 400k? 700k?
TheWhiteKnight@reddit
FAANG salaries can be nuts, for example
Darkoplax@reddit
We really need /r/poorlocalllama
Minipiman@reddit
Doom
HoldAdministrative85@reddit
Money run money
poopsinshoe@reddit
Minesweeper
Select-Dirt@reddit
You can now goon at the speed of light! Congrats, you made it
iam-not-a-monkey@reddit
Doom perhaps?
mistrjirka@reddit
tell me this is a ragebair XD how the fuck can you buy 16x dgx sparks and not know what to do with them. Like deepseek v4 or GLM 5
Legitimate-Pumpkin@reddit
I’d go for GLM AND DS4… AND M2.5 and Kimi 😂
TheDiamondSquidy@reddit
Money i’ll never get to enjoy
Fluffywings@reddit
A giveaway for everyone in this post!
All jokes aside the biggest open source model that fits.
Torodaddy@reddit
Oh you rich rich
dr_hamilton@reddit
You should run... a giveaway competition for folks here 😁
IrisColt@reddit
oops, deleted
AdventurousVast6510@reddit
all that money spent just to talk to an ai girlfriend faster
Kurcide@reddit (OP)
My AI girlfriend will be so smart thgh
IrisColt@reddit
heh
darkscreener@reddit
A simulation of the universe
villefilho@reddit
Minecraft, recommended settings
Toto_nemisis@reddit
Doom, that's what I would run
Adorable_Weakness_39@reddit
Space Cadet Pinball
ElChupaNebrey@reddit
Why not to make a test of all?
TwofacedDisc@reddit
Doom
Kutoru@reddit
I'm confused about the reason anyone would actually even consider 16x DGX Spark cluster for individual use. The DGX Spark is more suitable for larger inferences but that's just relative to its own inference performance.
Even for say clustering workloads, you can verify everything you need to on a 2x system (there are far more issues that can happen but those generally lie outside of the model-land).
There's nothing particularly special about 400gbps? Sure you don't see it on a consumer board but 400gbps is ~50GB/s and PCIE 5x16 has ~64 GB/s. So you can just sacrifice a PCIE slot for a Mellanox adapter.
ycnz@reddit
An ebay auction?
Final-Frosting7742@reddit
Run deepseekv4 at 0.5 token/s!
leopold815@reddit
Crysis
lqstuart@reddit
crysis
I_EAT_THE_RICH@reddit
You should run to the local charity and make a donation
the3dwin@reddit
There are a lot of comments so perhaps someone already made the comment:
If possible Use half to inference the top 8 models to the hardware to run something like chatgimmy.ai where you offer the use of hardware over an API like OpenRouter.
Then I suppose the other half 8 put away 4 as backups, and the other 4 for everything and anything you can think of.
Heisenberg99_1_@reddit
Uncensored hentai models
skmagiik@reddit
Can you please run benchmarks bringing up various sizes in the cluster. I'm curious how much performance (tokens/sec) you get per dgx
drox63@reddit
Why go this route and not getting a full rack setup? I mean I know why I would want to do this… but what are you doing it?
Also could I have dibs on any units you will be decommissioning?
Kurcide@reddit (OP)
I have 8 a6000 ADA for sale that never got used
drox63@reddit
Can you dm me the link? Assuming the dgx spark is much cheaper on power and other infra to run, is that your theory?
Kurcide@reddit (OP)
https://a.co/d/0385KmCK
It’s slightly cheaper but tons more memory
Signal-Run7450@reddit
Run for your life, you might get attacked soon😂
uIDavailable@reddit
Idk op is asking this. They are cross posting in other subs with different titles and descriptions of the same exact post.
Kurcide@reddit (OP)
I posted one other time in homelab lol
FlyingDogCatcher@reddit
Can I come play at your house?
Kurcide@reddit (OP)
sure, come on down
forestryfowls@reddit
How does the high bandwidth networking work for this? Can you connect all 16 on one switch or do you need multiple? Can’t wait to see updates! Just saw serve the homes write up on 8 of these and that looked like a fun time.
Kurcide@reddit (OP)
a single 200Gbps switch with 16 ports is what you need. I have an FS with 24 ports
burnt1ce85@reddit
Holy smokes. How much did this cost you? Over $70k?
Kurcide@reddit (OP)
yes
Thistlemanizzle@reddit
Run eBay 's API connection sell all of it and switch to API tokens.
lannistersstark@reddit
You're going to run a very very large model at 10 tps?
Kurcide@reddit (OP)
yup, and eventually see if I can just use the entire cluster for prefill
YairHairNow@reddit
Jeez, can we be friends? It's like having a friend with a pinball machine collection.
But yeah, I'd go deepseek, Kimi, and check out Nvidias gaussian splatting/3d asset harvesting models. Doing benchmarks on NVFP4 would be cool.
Hour_Bit_5183@reddit
LOL how rich are people in here? My god. spending all this money to make ????????. This is either a troll or a really dumb person who somehow made money.
Kurcide@reddit (OP)
Just dumb I guess. Gunna go cry into my GPUs now
LankyGuitar6528@reddit
Have you seen Elon? Being really dumb seems to be a prerequisite to making money these days.
vulcan4d@reddit
Nvidia Ad
Cellsus@reddit
Tetris
Prince_ofRavens@reddit
If you don't already have the answer to that question and a backlog of a couple months of answer to that question I feel like you made the wrong choice lol
JuniorDeveloper73@reddit
The world...millons starving and very few...
realzequel@reddit
So none of us should buy luxury goods? Slippery slope.
theowlinspace@reddit
No, but it's important to acknowledge inequal wealth distribution throughout the world. If you can't directly do anything to change it, the least you can do is recognize your privilege.
While you buy luxury goods, others are much less fortunate and are struggling to make ends meet. While you can't change this because it's principally a result of capitalism, and even donation only delays the problem, understanding that your privilege is only the result of the plight of others instead of just blindly saying "Well, I can buy luxury goods, what do I care if people are starving" is much more moral
realzequel@reddit
Who's to say OP doesn't contribute a lot to charity? Want to change the world (or US at least)? Stop voting for the rich party (not you personally but people in general). Criticizing a reddit post is doing jackshit.
theowlinspace@reddit
Contributing to charity doesn't solve the root of the issue, and it's beside my point, I did mention that donation only tries to temporarily patch/delay the problem, not that it's a solution. Voting hasn't and won't ever change anything, the world will only change when the people organize.
I haven't criticized OP or any reddit post, all I'm saying is that recognizing the issue and hoping for a better future is better than trying to ignore it.
realzequel@reddit
So an armed revolution? Won’t happen in the age of entertainment. Too much bread and circus. Peaceful protests have done nothing as well. So your solution is unrealistic imho. The OPP was criticizing the post, not you.
MrHaxx1@reddit
Do starving eat people DGX Sparks, or what are you suggesting?
Kurcide@reddit (OP)
Just tried, they don’t taste good
Makers7886@reddit
DataPhreak@reddit
Oof.... bad deal. You could run A LOT of small models at a medium speed, or 3 kimi's at a snails pace.
Stunning_Habit_6411@reddit
Use it to generate an image with 32
DarkShadder@reddit
I am new to this sub, are people of this sub really this insane?
Anarchaotic@reddit
Outside of pure "big model!!!", I'm really curious to see how concurrency works. This is a use-case for a small team that wants to focus on local-first, and so I'd love to understand how 4-5 different users would be able to send concurrent requests, or even what the realistic cap is for work.
Let's say you have 4 devs working on a codebase at the same time, does something like this give enough headroom for them all to have stronger models all working in tandem?
Turbulent_Pin7635@reddit
Run a shop
celsowm@reddit
Doom 1993
Codename280@reddit
You should run to the bank and donate some money. Cause clearly you're spending it without purpose.
Bozhark@reddit
Agentic subagents that operate individual agents per agentic service
PiratedComputer@reddit
Run Doom
nopanolator@reddit
You have the hammers, who care about nails lmao this fcked world now
gobblegoooblegobble@reddit
DMing you
LankyGuitar6528@reddit
Local instance of Mythos? Take over the world maybe?
aka_blindhunter@reddit
You run to me need the address
chensium@reddit
lol what!
typical-predditor@reddit
You should run one over to me.
_p00@reddit
Would love to run Pac-Man 256 on it. Should go fast.
BrianJThomas@reddit
Sometimes I'm tempted to do something like this. I'd probably have to pull power off of the dryer outlet in my 1br apartment. I wonder if anyone else is doing this...
RelationshipLong9092@reddit
It has to be GLM-5.1, at a total weight size of 1.51 TB.
You can fit Kimi K2.6 on just 8x Sparks, and other people have done so before. Boring!
But I've never seen anyone set up a 16x cluster, so you'd be the first (I've seen) to run GLM 5.1 locally on "consumer" hardware.
Beginning_Bed_9059@reddit
An electrical grid
Glazedoats@reddit
GLM 5 AND the new Deepseek. 🤭
AsiancookBob@reddit
You probably answered this somewhere in this thread but I thought you can only link up to two nodes per Nvidia docs? Unless they scaled it up.
TheRiddler79@reddit
If this is true, this guy just realized a nightmare
TheRiddler79@reddit
Mimo 2.5
Cless_Aurion@reddit
I've never seen such a WRONG comment section.
The answer is DOOM.
brenden77@reddit
You're wrong. It's FarCry.
CyborgBob1977@reddit
Looks like anything you damn well want...
shadowmage666@reddit
See if crysis works
Familiar-Virus5257@reddit
I laughed way too hard at this bc I am too old. I remember the days of "but can it run Crysis?"
BakeMajestic7348@reddit
Bro is older than 13
HIGH_PRESSURE_TOILET@reddit
It actually probably does tbh. There's a list of some popular games (though not Crysis) with approximate fps figures on the DGX Spark in the recent steam arm64 snap thread: https://discourse.ubuntu.com/t/call-for-testing-steam-snap-for-arm64/74719
Mochila-Mochila@reddit
Bro, you're crazy.
That's all and have fun 💪
tauronus77@reddit
Can it run Crysis?
Ok-Hotel-8551@reddit
Minecraft
redilaify@reddit
the hell do you do for a job
ACIeverName@reddit
Run the latest deepseek v4-pro-max
ChiptuneXT@reddit
Qwen 3.6 plus :) so in reality awesome to see multi agent with top tier models like Kimi
DeadLolipop@reddit
Surely for the price of these, you could get a big fuck off nvidia server
akumaburn@reddit
This may actually be a better use case for Node-level MoE (different model per spark and a router that decides to send to which model depending on the request).
You could also agent swarm smaller models as well.
MajorZesty@reddit
Did you compare purchasing this vs a DGX Station? Ofc, thinking about it this is probably still 3/4ths the cost depending on the switch.
SnottyMichiganCat@reddit
Yea. At this scale I think that would make more sense.
But, maybe weird circumstances made these available. 😅
Freonr2@reddit
"Funny story, there was a truck overturned on the side of the road..."
wazabee@reddit
Crysis?
SarcasticOP@reddit
This is the what they would steal off a truck if The Fast and the Furious was made today.
_BigBackClock@reddit
madlad
NBelal@reddit
Minecraft of course
eecchhee@reddit
World of Warcraft Classic
Rude_Ambassador_6270@reddit
run for your life!
sultan_papagani@reddit
any chance youre looking to adopt a fully grown adult?
bartskol@reddit
You should run charity and donate one to me.
cauchy2k@reddit
i would use them to watch youtube and netflix
marutthemighty@reddit
Are you starting a video game company? Or are you building a new AI company?
Sanity_N0t_Included@reddit
What should you run? Apparently a payday loan operation since you have the big bucks. 🤣
duhballs2@reddit
to my house with one of them.
Ok_Technology_5962@reddit
Deepseek v4. Show us how its done
SGAShepp@reddit
From your wife
Historical-Internal3@reddit
Think I would of went with a DGX station - just around $20k shy (lol).
Especially if you were considering adding Mac Ultra's to the cluster.
Will easily pull 3-4k watts with this
geldonyetich@reddit
Same question of what a two ton gorilla's to do: whatever you want.
Highest scoring large open model weights on artificial analysis are probably your best bet. Check weekly, it'll change.
Granted this being an Nvidia kit you might want to keep the latest Nemotron loaded. Sometimes the biggest flex is what you can do with the least effort.
createthiscom@reddit
How the hell would you power that
trueimage@reddit
Hi can you send one over? DM for my mailing address.
spencer_kw@reddit
run a routing benchmark. put 5 models on it, same prompts, compare quality and speed across task types. that's the data nobody publishes and it's worth more than any leaderboard. tools like openrouter and routers like herma let you A/B test models against each other on real workloads, that's where the interesting numbers come from.
Accomplished_Steak14@reddit
OP can feed the whole africa with compute but choose to run open clown at home
bromatofiel@reddit
But does it run Doom?
jd52wtf@reddit
Sell them all and buy a B300.
g_rich@reddit
OP dropped some cash on these, but a DGX-B300 is upwards of a half a million; which would be well over 5x what OP invested here.
jd52wtf@reddit
Thank you for that analysis.
Fit-Palpitation-7427@reddit
B300 is not 2tb of vram
eallim@reddit
Can you let 50 instances of Nemotron 3 Nano Omni talk with each other?
moonrust-app@reddit
qwen 3.6 27b
Skylleur@reddit
Can I please have one :3
rallypat@reddit
Holy fuck I’m poor
arm2armreddit@reddit
If you already have an H100, just give it to some school kids or donate it to a university.
ayu-ya@reddit
You can send one to me, I'll call it my Sparky and take very good care of it-
Racer4711@reddit
a truck.
Stike_1@reddit
What should you run - run your Lamborghini, I guess :)
7657786425658907653@reddit
dude your ai girlfriend must be so quick at tokens
More-Curious816@reddit
I, also, want this guy AI girlfriend.
Helicopter-Mission@reddit
I hope you’ve not spent all this for inference only
Healthy-Nebula-3603@reddit
For llms those are useless.
Antique_Juggernaut_7@reddit
What an awesome project. Congrats.
I imagine you know about all of this, but here goes just in case:
Just make sure you follow the discussions on Nvidia's dev forum on the Spark. There has been a ton of issues that Nvidia has left unresolved in the GB10; some of them even touch the consumer/workstation Blackwell product lines. The most important one is the most vexing for Nvidia, which is that NVFP4 is NOT natively supported, for a couple of reasons -- some of them software-related (I think these are mostly issues with CUTLASS at the moment), but some of them hardware-related (GB10 actually doesn't have 5th gen Tensor Cores and that causes problems). These have been going on for a year now and the community is definitely frustrated.
Having said that, I am a happy owner of the two Sparks I own. If your project involves a lot of input tokens and/or a lot of concurrent requests, then a Spark cluster is very hard to beat.
Dependent-Wonder1366@reddit
turn it into a gaming rig
skydiver19@reddit
How long did it take for shipping and where did you source them?
machyume@reddit
Hey, when are you going to buy the power plant for your rig?
dataexception@reddit
Ummm... Pretty much anything you want. ;)
Anime-Man-1432@reddit
Marathon 🏃
unintended_purposes@reddit
https://huggingface.co/poolside/Laguna-XS.2
stormy_waters83@reddit
You should run to the post office and mail me one. Please and thank you.
Comfortable-Tie2933@reddit
🤫🥶
somnamboola@reddit
you should run yourself into a safe neighborhood
Substantial-Tax406@reddit
WHAT DO YOU DO FOR LIVING ?!!
misha1350@reddit
Palantir perhaps
Ok-Internal9317@reddit
💀
Deep90@reddit
His uncle invented Pokemon
htownclyde@reddit
trust fund
sparkleboss@reddit
Doom
billy_booboo@reddit
Send a couple to me, please
UnbeliebteMeinung@reddit
This is beautiful
charliex2@reddit
should get the asus ones instead, they're $1k cheaper and just a smaller base drive .plus the thermals seem to be better my gold sparks run way hotter than the asus's.
HongPong@reddit
i would try to start a public business (or perhaps a cooperative depending) to offer local AI services to the people in the region
Training-Event3388@reddit
I’m just curious on the cost this is amazing
TOO_MUCH_BRAVERY@reddit
turbotax
JacketHistorical2321@reddit
Sell them all and save for M5 silicon.
Eden1506@reddit
At that point I would have bought 7x RTX 6000 pro instead tbh.
Kurcide@reddit (OP)
I have a 4x H100 NVL system and a GH200 in the same rack. The point of this was to build a scalable cluster that exceeded the united memory pool of an RTX or H100 system. In the end, this is the cheapest way to scale to 2TB of memory on in the Nvida ecosystem
MajorZesty@reddit
Still feels like you'll hit way too many network bottlenecks to effectively use it. That said, I hope you post your progress! I'd love to be proved wrong.
Possible-Pirate9097@reddit
jfc ok you win bro
Subject-Tea-5253@reddit
Can I get one, please?
bebackground471@reddit
ok, first of all, congratulations on the litter oh cute, healthy little bundles of joy. Second of all, gimme two. I will care for them as if they were my own.
dtdisapointingresult@reddit
I mean what is there to think about? You can easily run the largest local model, GLM 5.1, at BF16 if you want (but obviously, do it at FP8).
Just try the biggest and baddest model from each top lab: Deepseek V4 Pro, GLM 5.1, Kimi K2.6. Qwen 3.5 397B is too small, I feel it would be a waste on your hardware.
Freonr2@reddit
I think you're going to be busy for a while trying to optimize that.
slindshady@reddit
Minesweeper
Foreign_Aid@reddit
With 2 TB of pooled memory, you have the physical capacity to load heavyweight models structurally equivalent to Gemini 1.5 Pro or early iterations of Gemini Ultra (as well as GPT-4 class architectures). Using 8-bit quantization (FP8), where one parameter equals 1 byte, you can deploy Mixture of Experts (MoE) models ranging from 1 to 1.5 Trillion parameters. You will still retain a massive memory buffer to handle an enormous context window (e.g., processing dozens of textbooks or huge code repositories simultaneously).
StardockEngineer@reddit
You should run about 4 off to the post office and mail them to me.
TechieByChoice@reddit
A sale!?
Eugr@reddit
OP, I’m very curious how that would work. What switch are you going to use to connect all of them together? Please reach out to me in DM or on NVidia forums - we haven’t seen a 16 node cluster in the wild yet. Should still work fine with our community build: https://github.com/eugr/spark-vllm-docker
g_rich@reddit
That's like over $70k of hardware not including the switch or cables; DGX Spark's have their place but this certainly isn't it. For one you'll never be able to scale past the memory bandwidth bottleneck so you'll be stuck at 20-40 t/s; a Mac Studio cluster would give better performance for almost half the price. If you needed to stay within the Nvidia ecosystem then a workstation built around a handful of RTX Pro 5000's or 6000's with a Threadripper and a good amount of RAM along maybe a few DGX Sparks if you needed to anything with ConnectX might have been a better investment?
Out of curiosity why did you go this route?
On the upside you've made my investment in a Mac Studio, soon to be 2 Asus GX10's and 10 gig switch a lot more palatable.
jmakov@reddit
Deepseek v4 Pro, cheaper and with less timeouts than Ollama Cloud. Looking FW to letting us try your new cloud 😁
codeninja@reddit
To my house with 2 of them.
SomeIngenuity1957@reddit
You should use it to play Minecraft
KingMitsubishi@reddit
Doom
Reasonable-Waltz7016@reddit
Double it and give it to the next person
staatsclaas@reddit
This has to be bait.
Dapper_Chance_2484@reddit
Why?
seanliam2k@reddit
What are you trying to achieve/do with this?
SerejoGuy@reddit
https://i.redd.it/xv62jte9f5yg1.gif
Irrealist@reddit
A giveaway.
jamesrggg@reddit
You should run towards some bitches
(nah im just playing, happy for you)
SirBardBarston@reddit
What is your use case?
kassandrrra@reddit
kimi k2
OleCuvee@reddit
a nuclear power plant 😀
qodeninja@reddit
nice man whats the plan with all this?
PrysmX@reddit
But will it run Doom?
jhenryscott@reddit
Minecraft!
cr0wburn@reddit
Doom
Pinzasca@reddit
This!
Luke2642@reddit
Just out of interest, why did you choose this? What was your economics calculation?
If I had the money, it'd be for https://tinygrad.org/#tinybox
Status-Secret-4292@reddit
I have to ask.
How much did this run you?
What do you actually do with LLMs?
What do you do for a living?
legatinho@reddit
Backstory?
astronomikal@reddit
16x of the best models models swarming research
Snoo_81913@reddit
Whatever the hell you want LMAO wut. How the hell did you get 16x sparks? What do you guys do?
Possible-Pirate9097@reddit
Has to be Jensen's secret blood boy.
outtokill7@reddit
Doom
kimmich_kim@reddit
Hehe all of deep seek v4
Conscious-Map6957@reddit
A circus
DarthCalumnious@reddit
Minecraft
thari_mad@reddit
power station
reto-wyss@reddit
I want to know how noisy the switch is.
I'd only need a 100Gb switch and I'm wondering whether there are some that are not vacuum cleaner level. I've simply been rolling direct connection with dual port 100g cards but of course that limits things to three systems.
Although, if I remember correctly that may be a self imposed restrictions to keep a certain level of sanity.
Kurcide@reddit (OP)
way too loud to be out in the open. I have a custom soundproof rack and built an exhaust system to pump hot air outside the house.
reto-wyss@reddit
Let's just say the builders have been summoned to come an drill a hole already... But I'd like to avoid doing any serious sound proofing.
Did you have any issues with humidity? Dust filter on room intake?
Kurcide@reddit (OP)
Definitely need to clean the server regularly. Dust is always an issue. I have a 15k BTU unit cooling the room with a sealed exhaust to pump hot air out. I haven’t put a filter over the intake yet but I just keep it all clean and it’s been fine
halcyonhal@reddit
Would love more details on what you did with the rack and the exhaust.
ComplexType568@reddit
I don't think this is a "home" lab anymore... Though, what stopped you from assembling a few RTX pro 6000s? I feel like those would be easier to handle and are stronger?
Also try K2.6, GLM 5.1 and MiMo V2.5 pro... And maybe DeepSeek V4 Pro when it stabilizes. Qwen and MiniMax are probably too small for you
Kurcide@reddit (OP)
all about unified memory. I already have a 4x H100 NVL system in the rack. This is actually the easiest and cheapest way to get to 2TB unified memory short of buying a B300 server for $600k
ComplexType568@reddit
Interesting, always thought M3U 512GB Macs were. Well, have fun with it! I'm still pretty concerned if them all working in parallel will be super slow though since they're all spread out.
NetZeroSun@reddit
At some point we are going to have a bunch of techies and nerds sitting on a bed of DGX, NVME, or storage and flashing victory “gang” signs while looking all “you mad bro”, compared to rappers sitting on piles of cash.
linumax@reddit
Can it run crysis ?
abnormal_human@reddit
You can tell NVDA is at an all time high this week.
ajw2285@reddit
Crysis
ClassicalPomegranate@reddit
Google Chrome!!
Mugen0815@reddit
Start a github-copilot-replacement. We need one.
ResidentPositive4122@reddit
Read this article the other day, you should give it a brief look-over, might find some interesting things in it. They did 8x but most of the stuff was pretty interesting (especially the pre-setup, and what snags they hit along the way): https://www.servethehome.com/big-cluster-little-power-the-8x-nvidia-gb10-cluster-marvell-cisco-ubiquiti-qnap-arm/
reto-wyss@reddit
Thanks, that was interesting. I like servethehome, I just don't follow them closely for longer stretches. Good to see they actually know how to use the software and run proper concurrent workload test - it's a rare sight unfortunately.
Dany0@reddit
You should run to the hills because us GPU poors (I have a 5090) are gonna chase you n steal em
VoiceApprehensive893@reddit
finetuning on kimi
Repoman444@reddit
Let’s do a giveaway to people on this thread!
Nilosderzweite@reddit
What? The question is where 😅 or did you pay them already?
KyteOnFire@reddit
A bargain sale ?
amitbahree@reddit
I asked something similar - https://www.reddit.com/r/LocalLLaMA/comments/1su3tfb/what_do_you_want_me_to_try/
Elorun@reddit
Run? Run for the hills!
dedSEKTR@reddit
Give me just one? :/
Silver_Jaguar_24@reddit
Unsloth to fine tune some models?
sometimes_angery@reddit
A black market for DGX Sparks