16x DGX Sparks - What should I run?

[-]

MrAlienOverLord@reddit

16 .. damn - i only have 8 - glad you putting in the r&d on bigger gb10 clusters - i was considering adding 8more but given i have only the crs804-4ddq i would need 4 switches to get that wired up 6 4 4 6(only2 used) if i interconnect the switches with 400g - that be additonal 3k for the switches and 3k for the cables ( ya the breakout cables are not that cheap lol)

please post benchmarks - also im sure thomas/azeez from atlas inference - particular for the sparks could get quite a bit more oompf out of those nifty devices

that beeing said i really hope someone cracks the firmeware for connectx-7 so we can use regular IB vs ethernet

[-]

Kurcide@reddit (OP)

8 is still a huge cluster of these. Would love to compare notes

[-]

MrAlienOverLord@reddit

thats my wonky tower of gold - pre racking em up

[-]

anactualalien@reddit

Nice paperweights.

[-]

RedShiftedTime@reddit

Seeing this has me realize I shouldn't be chasing hardware and should just be happy getting railed with whatever Subscription plan the large providers offer. I was debating spending 10k on the new Mac Studio + some sparks for prefill, but seeing all this hardware (over $70k worth) is only capable of running Kimi 2.6 it's like, ok sure privacy, but having to spend 120k in hardware just to get reasonable speeds for these models? I'll just...pay for sub or API access.....I suppose.

[-]

caim2f@reddit

Wouldn't ASICS with baked in models/weights be better ? Maybe in 30-40 years from now if we can get the manufacturing process down right we would have access to personal inference devices with lightning speeds. From what I've gathered they cost over 1,000,000$ to produce currently. I can't help but have this gut feeling that GPUs are not the right way to approach this.

[-]

RedShiftedTime@reddit

I don't think we will be using LLMs, in the current sense, in this time frame. Something else will be here. General computing isn't going anywhere.

[-]

MisticRain69@reddit

Only time stupid expensive local hardware for the SOTA models is the right call is if you make a bazillion dollars from something thats a trade secret that you can't use cloud models for since that would just give the cloud providers that trade secret

[-]

sn2006gy@reddit

it’s not a trade secret if a token can do it

[-]

Psychological-Lynx29@reddit

I think the sweet spot is spending like 7 to 12k on a GPU or multigpu system. That way you can run 27b+ models with max context.

[-]

Serprotease@reddit

No point on chasing the latest sota with consumers/prosumer level hardware. There is, I think a limit at around 400b model (256gb ram/vram) for useable local llm at achievable price (less than 10k) with usable performance.

Going above that and you are looking at either abysmal pp/tg, crazy expensive (power and cash) system, and/or kafkaesque setup.

[-]

TechTwentyTwo@reddit

You should run an analysis of your decision to purchase 16 Sparks

[-]

thecodeassassin@reddit

Right. See here's the problem.

I have a bunch of these as well and i really dont enjoy running kimi k2.6 or anything else large on it . Just too slow. I always fall back to my rtx 6000 pro cluster literally anything serious.

[-]

cartographr@reddit

Just to ask - what was the primary purpose of this purchase? Lab, mini dev/test data center or production use? This is definitely scale out vs scale up structure (imho) i.e. run more capacity / more throughput rather than run one thing faster or a very large model very slowly.

[-]

Dry_Yam_4597@reddit

Sell them and get some H100s.

[-]

Kurcide@reddit (OP)

I have a 4x H100 NVL system already in the rack

[-]

xamboozi@reddit

I have no idea what that many DGX sparks would do for you that 4x h100's wouldn't.

The DGX spark doesn't have a lot of memory bandwidth and the 200bgps links are even less throughput, so like.... Why?

[-]

Kurcide@reddit (OP)

Can’t run any SOTA open source models on 376gb Vram

[-]

thehpcdude@reddit

Would be cheaper and easier to just rent 8x H100's, especially when SOTA is going to be 1T+ params in the near future. Hopefully you didn't actually buy a bunch of sparks.

[-]

siete82@reddit

Also pay for the claude subscription, but that's the point of this sub

[-]

joefourier@reddit

A Claude subscription that goes offline for hours at least once a month, gets nerfed with the company denying it for weeks when they can't get enough GPUs, and wastes millions of tokens on thinking traces that you can't actually inspect?

[-]

illicITparameters@reddit

I can't wait for Anthropic, Google, Microsoft, and OpenAI to start charging by the token.

[-]

thehpcdude@reddit

To me the point is more what can I do with reasonable hardware or what hardware a common enthusiast can wield. I think the other half of the point is showing that smaller parameter models can do day-to-day actions with ease.

Buying a bunch of off the shelf hardware to run a SOTA model at home is a waste of not only money but time. Not sure why people think it's some sort of flex, but I may be biased because of my work.

[-]

Ok-Internal9317@reddit

True

[-]

bigh-aus@reddit

yeah not worth geting the H100s unless you already have them - H200NVL is better - 4x 141gb but the price vs 16 dgx sparks - $120k+ vs \~$64k...

Problem is you really need 8x H200s and a machine to use them - getting closer to b200 territory.

[-]

Relative_Rope4234@reddit

bro must be a millionaire

[-]

sourceholder@reddit

Not anymore.

[-]

Successful_Flow1329@reddit

Well, not if he was billionaire before.

[-]

VegetableDelay1658@reddit

Yeah this dude has watches that are more expensive than my life

[-]

Reasonable_Ad5611@reddit

not anymore

[-]

VirtualPercentage737@reddit

He just paid for Jensen's kid's college.

[-]

florinandrei@reddit

Or for the 17th alligator leather jacket.

[-]

SkyFeistyLlama8@reddit

Plot twist: OP is an Nvidia billionaire or hundred-millionaire, one of the early joiners with a ton of stock options.

[-]

Thicc_Pug@reddit

right, he used to be a billionaire

[-]

Thalesian@reddit

Just checked the post history and yup. At least.

[-]

xb4r7x@reddit

Like most people in the tech industry...

[-]

Deep90@reddit

Does it count if you have a million in debt?

[-]

Noiselexer@reddit

uhuh, why even bother with sparks then?

[-]

Dry_Yam_4597@reddit

Damn, that's nice.

[-]

quadiuss@reddit

Selling 16 of them just to get three H100s

[-]

woobchub@reddit

Away

[-]

SuperLucas2000@reddit

Chrome with 3 tabs i date you

[-]

Kurcide@reddit (OP)

I only have enough confidence for 2 tabs

[-]

jimmytoan@reddit

With 2TB unified memory across 16 nodes, the big unlock is running 671B+ parameter models at full precision with long context windows. The sm121 missing kernel issue is real though - older LLMs won't run without workarounds. Best bet right now is Kimi K2.6 with vLLM using eugr's nightly builds while the DeepSeek V4 PR gets merged. Prefill throughput will be exceptional but token generation will cap around 20 t/s regardless of node count - if generation throughput matters, hybrid with Mac Studios for the decode step.

[-]

Kurcide@reddit (OP)

that’s exactly what I want to do once M5 Ultras come out. Add some Macs to the rack

[-]

yammering@reddit

16 is um, a lot. Kimi K2.6 runs very well on my eight node cluster with vLLM using eugr’s nightly builds. There are unmerged PRs for Deepseek V4 for vLLM. Flash runs fine on 8x, Pro could fit on your 16. You will get monster refill numbers but no matter what you do token generation with average 20 t/s.

[-]

Kurcide@reddit (OP)

I’m hoping to eventually add Mac Studio M5 Ultras to this for token gen and have the Sparks be prefill

[-]

yammering@reddit

Do you know what software stack for that? The sparks are quirky in that even older LLMs like DeepSeek 3.2 don’t run due to missing sm121 kernels for some types of attention. It’d be awesome to frankstein that but i’m skeptical.

[-]

Xlxlredditor@reddit

I believe eXo supports prompt processing on the spark them running them prompt on M5 Ultras

[-]

-dysangel-@reddit

Whoah. I might have to try this with my M3 Ultra..

[-]

Xlxlredditor@reddit

Not yet apparently. I thought they already did but no

[-]

-dysangel-@reddit

They do: https://blog.exolabs.net/nvidia-dgx-spark/

[-]

TechTwentyTwo@reddit

They demonstrated it in October of last year and wrote that blog post stating that 1.0 would include capability for disaggregated P/D across Nvidia and Apple silicon, but when they released 1.0 in January 2026, it wasn't and isn't yet included. There has been progress on it as recently as the last couple days (PRs #1993, 2000), so it probably won't be too much longer before this is ready to ship. Keep an eye an PR#1776 on the Exo github

[-]

TechTwentyTwo@reddit

Not yet

[-]

Xlxlredditor@reddit

Crap. They wanted to, iirc?

[-]

Badger-Purple@reddit

nope

[-]

MrAlienOverLord@reddit

i think you are actually better off running raw vllm on the sparks then adding the macs to it - exo way with heterogen. networks have a massive latency to transfer the state and to my understanding its mostly llama.cpp that runs on those .. -> way way way too slow to be usefull - there benchmarks dont tell the full story as they run llama.cpp on the sparks which noone in its right mind would do

[-]

TechTwentyTwo@reddit

I am trying to set this up at this very moment. I have 4 Mac Studio M3 Ultra 256 GB coming. The first two will be here tomorrow and the other two in a week. I already have two DGX Sparks

[-]

averagepoetry@reddit

Please update if this works! I have m3 ultras as well and would love to pair them with the dgx spark.

[-]

Fit_Concept5220@reddit

For anyone interested, the estimated prefil for dense Gemma/Qwen ~would be around 130k t/s. That said, 100k prompt will be processed literally in a second. The estimated token generation on as of now hypothetical m5 ultra would be around 70/80 t/s on q4 quants.

I must admit to myself that I was deeply wrong about dgx spark and this is a monster machine for prefil cluster, and also the setup with dgx plus studio is genius example of out of the box thinking. Thanks for sharing OP.

[-]

Kurcide@reddit (OP)

It’s absolutely possible to have a 16x cluster

[-]

vVolv@reddit

I'd love to learn more about how you're clustering them - I haven't looked too deeply into it, but I recall prior to launch it mentioned you could link two of them, and presumably it would be a limitation in the dgx OS. To be clear, I'm not saying it can't be done, I just would like to know how.

[-]

Badger-Purple@reddit

The switch he got will allow for that kind of cluster

[-]

vVolv@reddit

Yeah, I got that but I thought (obviously incorrectly) that they'd baked the limit into the software

[-]

Cane_P@reddit

They have never limited it, they just don't support it officially. Any problems is up to yourself to fix, they won't do it.

[-]

Sea-Replacement7541@reddit

Dumb question. But by prefill you mean the time to process the prompt?

So there people count time to load prompts and then time for token generation which means the actual output?

[-]

illforgetsoonenough@reddit

Prefill = prompt processing Decode = token generation

[-]

More-Curious816@reddit

Yes. Both are important, if one is slow, your output is slow. Like spark has monster prefill but crappy tg, while macbooks has crappy prefill but decent tg.

[-]

worldburger@reddit

How will you do that with Mac Studios?

Does EXO do disagg prefill-decode?

[-]

Capable_Site_2891@reddit

exolabs.net

[-]

worldburger@reddit

Does EXO now do disagg prefill decode?

[-]

MajorZesty@reddit

Their repo makes it sound like Linux support is currently CPU only and I can't find anyone talking about using disagg this way, only wanting to. Feels like there'd be a lot more info on this, but I'm still gonna dig some more.

[-]

NoFaithlessness9789@reddit

What about https://github.com/Scottcjn/exo-cuda ?

[-]

Badger-Purple@reddit

no one has replicated their “experiment” and I’m pretty sure it was more marketing than reality

[-]

Capable_Site_2891@reddit

There is less of a reason to do so now, with the m3 Mac vs the spark was 11:1, m5 is 3:1. If m5 ultras came in the 512gb configuration at a decent price point, the spark would be almost redundant for this.

[-]

ItzDaReaper@reddit

Which chat room? Can I join?

[-]

ifheartsweregold@reddit

2x Spark Owner here….all I can say is good fuckin luck with that.

[-]

ComfortablePlenty513@reddit

nvidia and mac are two entirely different stacks, so idk how you'll manage.

[-]

cwr252@reddit

Honest question: why not use API at this point? Is it because of privacy?

[-]

ServiceOver4447@reddit

why get married when we can fuck for a hundred bucks

[-]

AlienRedditMaster@reddit

Same answer ? to have kids ?

[-]

ServiceOver4447@reddit

you don't need to be married to have kids

[-]

FatheredPuma81@reddit

Because someday you might want to have sex again.

[-]

ServiceOver4447@reddit

best blowjobs are from some prostitutes, why? because they are experienced.

[-]

FatheredPuma81@reddit

Cause you want sex.

[-]

Gravefall@reddit

because condoms

[-]

pm_me_tits@reddit

Except in this analogy we're rawdogging the api (aka they can read your input)

[-]

cwr252@reddit

Fair point haha

[-]

SKirby00@reddit

I'm actually kind of curious about this myself, so I did the math. Here's a breakdown of why it could make sense for someone to do this. It makes a bunch of completely baseless assumptions that probably don't all hold true for OP.

He probably spent ~$75K USD on this before tax ($4,700 MSRP × 16 = $75,200). Given the size of the investment, I'm just gonna go ahead and assume that someone making this kind of purchase has a business and will be able to write this off as a business expense (or more likely, write off its depreciation over the next few years). Assuming they expense any depreciation and then recuperate the residual value in a few years (let's assume for ~$3000 USD in 3 years), these could easily have a true/effective cost closer to $4,700 - $3,000 = $1,700, $1,700 × (1 - 0.30) = $1,190 per unit (this baselessly assumes that it would be offsetting income that would otherwise taxed at 30%) or closer to $1,190 × 16 = $19,040 total. So in this hypothetical the cluster would have a ~$19K effective/net cost over 3 years (or ~$6.35K per year).

Now let's see how much API usage it takes to hit ~$6.35K per year. For Kimi K2.6, it's $0.95/1M input and $4/1M output (edit: I made a mistake here, see my note at the end). Baselessly ssuming a ~3:1 input to output token ratio (this varies a lot by use case), that's about $6.85/4M tokens total, or about $1.71/1M on average (note however that there seem to be K2.5 providers that offer ~half this cost). At that price, they'd need to process ~3.7B tokens (at that same 3:1 ratio) per year to reach the same cost. If this cluster is running 365 days/year, that's ~10.15M tokens per day, or 423K tk/hr, or 7,050 tk/min, or 117 tk/sec. Considering this is for combined input and output, that feels very feasible to surpass with such a big node, but it also hinges on a 24/7/365 usage assumption which is likely unrealistic. There's one big caveat though... I didn't factor in electricity at all, and frankly I don't feel like it.

Anyway, with enough usage, the right tax/cost recuperation factors in place, and relatively affordable electricity, it's very possible for this to be comparable to cloud models in term of economics, at least for a business.

There are also other factors though. Off the top of my head, I can think of: - Privacy re: valuable business information - Privacy re: client or employee information (incl. possible contractual obligations/restrictions & legal requirements) - Cost stability/predictability - Different accounting treatment for investments vs operating expenses (varies greatly depending on where he's located) - Response latency - Independence / self-reliance - Stability / predictability (quality won't suddenly change out of the blue, and they won't be forced off of one soon-to-be-discontinued model at an inconvenient time to optimize all their work around some new model) - Better looking balance sheet with these assets on hand could feel more comfortable for investors or debtors - More end-to-end control could mean better optimizations around caching, which could help reduce costs

Conclusion: the margins are pretty tight, but with enough utilization/uptime, this could achieve significant non-monetary benefits at a reasonably low relative cost increase, or potentially even a cost reduction compared to using an API. But this requires HEAVY utilization and reasonable electrical costs.

Wait a minute... I forgot to adjust the API cost for the ability to write it off as business expenses at a similar rate as the depreciation. I don't feel like adjusting the math on that, but it definitely does make it harder to achieve a similar cost. Not impossible though.

[-]

ItzDarc@reddit

Don’t forget though that the cloud models are losing hundreds of millions every day and will likely be unable to sustain that for many more years. The price will likely go up especially for business application. They’re just currently in the “addict the culture” phase of this particular drug. I believe long term this will end up being the less expensive way to do it by far.

[-]

ThunderGeuse@reddit

He can actually write off 100% of the deprication in the first year thanks to OBBBA extending section 179 expense deductions.

[-]

han4wluc@reddit

what about electricity costs?

[-]

Ok_Warning2146@reddit

Why not just buy 8xRTX 6000? That should be faster for both prefill and inference.

[-]

SKirby00@reddit

I don't feel like doing the math for that lol.

It's much less memory though and might not be able to fit the very biggest models that he wants to run.

[-]

Ok_Warning2146@reddit

It is 768gb. Good enough for a quant of kimi 2.6. You can also use it for computationally intensive video gen

[-]

Cane_P@reddit

Not as much memory? If you are already in this economic ballpark, then you could buy a DGX Station instead. It will definitely have more tokens per second than Spark's. But I would probably wait for the next version, since the memory (that isn't HBM) have a lot higher bandwidth on it, compared to the Blackwell version.

[-]

ormandj@reddit

Any idea when that might be coming?

[-]

werther41@reddit

We currently building Parabricks server, clinical setting needs full data control, if you post patient data into any LLM through API, you have no idea where does it ended up with. The setup we have cost around 50k-70k, 2x RTX Pro 6000 96 GB vram. This cluster setup has a lot more unified RAM

[-]

AnonsAnonAnonagain@reddit

But for $85k it’s confirmed he could have gotten an MSI DGX Station GB300 Which would outperform 16x DGX Sparks, especially since the sparks do not have Commercial Blackwell (the sparks are missing TCGEN05)

(What is TCGEN05

[-]

ClickClawAI@reddit

First off, great work on doing the maths.

But you also left out another reason to do local over api… it’s way more cool!

[-]

_BigBackClock@reddit

why do we buy cars instead of leasing?

[-]

Ok_Warning2146@reddit

Well, u can get better car for the same money in the form of 8xRTX 6000.

[-]

nochkin@reddit

More like why we own a car instead of taxi.

[-]

muyuu@reddit

if you already have the hardware, why not?

[-]

cwr252@reddit

I can see that… just seems a bit expensive to buy it in the first place, doesn’t it?

[-]

muyuu@reddit

well, i'd say so, but there are definite advantages

you can run other configurations different than the ones offered by API, you can make it deterministic for instance which is useful for testing, you can rely on it being available in the future for specific workflows, etc etc

this is /r/localllama after all, you'd think people appreciate the possibilities

[-]

yammering@reddit

Where’s the fun in that? Also this is r/localllama not cloud :)

[-]

Roll_Future@reddit

I thought kimi k2.6 needs a monster with a shit load of ram and at least 2xh200. Am I missing something?

[-]

yammering@reddit

8 sparks is slightly less than 1TB of VRAM. That's enough for the 660GB of model weights and lots of KV cache. The downside is that only 20 t/s generation.

[-]

TheAncientOnce@reddit

what kind of speed are you getting?

[-]

somatt@reddit

Can you give any advice for learning to run LLMs sharded across clusters

[-]

yammering@reddit

There are a lot of options, and unfortunately the docs online are often out of date. I prefer vLLM at the moment but ignore everything in their docs about Ray, it is terribly unreliable (at least on my sparks) and native clustering works better.

[-]

somatt@reddit

Cool if you do please pm me a link I would love to see it. I was looking at petal?

[-]

running101@reddit

How can you run k2.6 across multiple machines? what mechanisms do you use?

[-]

siete82@reddit

vllm

[-]

Porespellar@reddit

Sparkrun.dev

[-]

TokenRingAI@reddit

Is that token generation number with or without speculative decoding?

[-]

yammering@reddit

Without.

[-]

bick_nyers@reddit

What's the prefill speed for Kimi? Are you using NVFP4?

[-]

yammering@reddit

kimi is natively int4 so i just kept it at that for accuracy. about 1500-1600 pp t/s at max context size.

[-]

Pupsi42069@reddit

Factorio

[-]

severemand@reddit

Reddit, is this a new trend that this generation is doing instead of super or muscle cars?

People buying stockpiles of compute and then goint to reddit to flex and ask what they should run on them?

Run what you have bought them to run probably?

[-]

ChocomelP@reddit

Imagine you could buy a Bugatti and then actually drive it everywhere at max speed all the time.

[-]

Direct_Turn_1484@reddit

Dude. How are you linking them? Daisy chain them all together or do you have a 16 port 200Gbps switch?

[-]

Kurcide@reddit (OP)

I bought one of these:

https://www.fs.com/products/352159.html?now_cid=4319

[-]

Deep90@reddit

The city is going to think you're growing weed with all the heat and power usage lmao.

[-]

ChocomelP@reddit

I just found the perfect cover. I should start a weed farm to hide the fact that I'm running GPUs.

[-]

SharpSharkShrek@reddit

If you don't mind me asking; why do you "need" all these hardware? Wouldn't it be much more cost effective to use online services if you're not selling AI solutions somehow and just using them?

[-]

Direct_Turn_1484@reddit

Nice. Wish I could have my own small scale data center.

[-]

Status-Secret-4292@reddit

I have to ask.

How much did this run you?

What do you actually do with LLMs?

What do you do for a living?

[-]

DownSyndromeLogic@reddit

I'm pretty sure you already have an idea what you're gonna run. I mean, why else would you spend. Fifty or 100 thousand dollars on all this equipment. You didn't just do it, just to post a post on Reddit and ask us what to do. Tell us what you're actually going to run.

[-]

Endless7777@reddit

Why did you buy them??

[-]

Playful-Cat-4226@reddit

u should run for president.

[-]

CubicalMoon@reddit

How do you end up with $75000 worth of tech and no idea what you actually want to achieve with it?

[-]

nickN42@reddit

Mate, are you a kid or something? Guy clearly does this professionally, he's here just to flex on us, poors. I would absolutely do the same in his situation.

[-]

electrosaurus@reddit

He's not the one that sounds like a kid.

[-]

Low-Boysenberry1173@reddit

Professionally? What the hack can you do with these pieces in a professional environment? This is far fron any professional context. It is just a bingo bullshit setup for fun.

[-]

electrosaurus@reddit

These are worse than AI bot slop posts and should be banned from the sub, really.

[-]

ThisWillPass@reddit

People spend the same on cars and rarely even drive them, which has been normalized for a long long time unfortunatly.

[-]

SleepAffectionate268@reddit

but that car may loose what at most 50% value in like few years the dgx sparks will be worthless in a few years, because we will have way higher ram and compute as with all tech, but with cars it depends

[-]

fitechs@reddit

You don’t have a car to drive it all the time? But to drive it when you need to

[-]

Successful-Total3661@reddit

Approximately how much power will it draw to run this cluster?

[-]

NetZeroSun@reddit

I know this is some serious flexing but I have to ask. What is this all for honestly and how did you pay it / what’s your job?

[-]

VegetableDelay1658@reddit

Check his posts, bro drives an aston and a lotus and wears AP and rolex

[-]

bobdvb@reddit

He was also into crypto.

Also collectables.

Also stocks.

I can't decide if he got fortunate along the way or just follows the wind with someone's money.

[-]

uhuge@reddit

*think The more you buy..

[-]

ICanSeeYou7867@reddit

Honestly....

I would set them up as kubernetes worker nodes with the nvidia gpu operator and the Kai scheduler... if the gpu operator node supports the GB10.

However you wouldn't be able to "combine" them easily. But it would be interesting!

[-]

norskyX@reddit

Adobe Flash /s

[-]

MotokoAGI@reddit

Ken, please stack the DGX Sparks on the shelves. The store is opening in 15 minutes.

[-]

PrestigiousDrag7674@reddit

I gotta show this on Reddit

[-]

beryugyo619@reddit

make no mistakes and make sure to include free tungsten cubes

[-]

Firewormworks@reddit

Hahaha

[-]

drox63@reddit

Let me get this pic out for the gram first Phill.

[-]

Raredisarray@reddit

Lmfao

[-]

Hearcharted@reddit

🤣

[-]

PrestigiousDrag7674@reddit

Let’s see your racks

[-]

Turbulent-Walk-8973@reddit

I have a single DGX spark, and I never managed to get above 45t/s with qwen3.6-35b-a3b at Q8. An I doing it right? I see so many people with 80+ on RTX GPUs for qwen3.6-27b, so I feel smtg is wrong somewhere. Or dgx spark is the wrong thing to buy

[-]

GabryIta@reddit

Kimi 2.6/GLM5.1!

[-]

Powerful_Evening5495@reddit

Send a brother one, and I will pray for you.

"May God increase his tokens to infinity."

[-]

sxt87@reddit

But why?

[-]

Revolutionary_Rub530@reddit

Gemma 4

[-]

Hambeggar@reddit

Aren't these kinda shit since they don't have TMem.

[-]

amp804@reddit

minecraft let the different models form clans lol

[-]

aomogol@reddit

Tetris 😄

[-]

FederalSun@reddit

Give me one lol

[-]

noo8-@reddit

An AI dating website
An AI garden centre growing grass (exclusively)
A marathon

[-]

Necessary_Pride1093@reddit

doom of course

[-]

Ok_Try_877@reddit

[-]

SnooDogs7747@reddit

Lowest settings

[-]

AcreMakeover@reddit

Might be able to handle medium if you're ok with 30 FPS.

[-]

Either_Audience_1937@reddit

At 480p

[-]

Intelligent-Staff654@reddit

To my place with 1 or 2 of them to drop off

[-]

_ytrohs@reddit

To the bank

[-]

spliffsandshit@reddit

Unfortunately this is going to be painful slow and inefficient. Processing speed will be great though

[-]

Fearless_Weather_206@reddit

Did you buy them at discount?

[-]

admiral_corgi@reddit

Probably going to need to upgrade your electrical lol, this looks like an insane amount of power draw

[-]

Kurcide@reddit (OP)

Already have a newly ran sub panel in the house with 240 circuits

[-]

optomas@reddit

All that, and no 3p 480V?

[-]

VestedLoves@reddit

The crypto/nft loser to AI loser pipeline is real.

[-]

CrypticZombies@reddit

check yo electric bill

[-]

mr_zerolith@reddit

Return them and get 4 RTX PRO 6000's.
384gb of vram is pretty decent, and you'll have about the same performance as 8 of those.

[-]

JustTesting314@reddit

Send me one I'm struggling with my 24 vram. Invest in me business 😁. That being said. Try deepseek pro

[-]

Master_Zack@reddit

sir are you a billionaire

[-]

jinnyjuice@reddit

What are you going to run them for?

Your choices are probably going to be between MiMo V2.5 Pro, DeepSeek V4 Pro, GLM 5.1, MiniMax M2.7, depending on the answer. DGX Spark's bandwidth is not that high, so go with a 4 bit quant AutoRound, vLLM if multiple users, SGLang if single user or two maybe three depending on usage intensity of each user.

[-]

Kurcide@reddit (OP)

This is all actually good advice. Appreciate it.

I was going to to run Deepseek, i’m trying out SGLang on 8 of the nodes now but looks like there’s still some issues with SM121

[-]

BIOffense@reddit

Honestly the only actual advice you've gotten in this thread, but really, extremely rare to find someone with a hardware, let alone 16 of them. Generally, you should hire.

i’m trying out SGLang on 8 of the nodes

One SGLang on 8 unified nodes, right?

there’s still some issues with SM121

For which model? What errors do you see?

[-]

Turbulent_War4067@reddit

I have seen quite a few folks advice him to run Doom. It seems to be the most popular answer :)

[-]

Kutoru@reddit

There are no "issues" with SM121. It will work, you just don't get optimized inference paths if running an outdated version or not supported.

Anything used by localllama can likely be fangled to work if you dive into the weeds, most of the time just recompiling the CUDA arch or doing tiny changes like adjusting the SM cache size (which are smaller).

[-]

ocassionallyaduck@reddit

Run from the banks.

They're going to repo your house.

[-]

Dry_Shower287@reddit

I think Even though 20 Sparks and one DGX Station are the same price, the Station offers much better value because of its insane speed.

[-]

No-Comfortable-2284@reddit

can 270gb/s bandwidth rly run anything at meaningful speeds

[-]

spaceman3000@reddit

I have strix halo, simmilar bandwidth. Dense are slow but MoE are fast even at 120B

[-]

No-Comfortable-2284@reddit

yea gpt oss 120b only processes 5b during inference so it is fast but what do u do with 16 dgx spark.

[-]

spaceman3000@reddit

No. Maybe guy works for Nvidia and got them free or this is marketing (he posted on several subs) or just his company bought them for developers and he will just do a temporary cluster for fun.

[-]

No-Comfortable-2284@reddit

gotcha thanks for that. yea pretty much my thought. didnt think u could run anything at meaningful speeds that requires 16 dgx spark. rly enjoying my spark though. its so quiet and nice to just have on 24/7 fine tuning and just hosting my website.

[-]

Vancecookcobain@reddit

Goddaaaammnn....how much wattage is that?

On the flip side you probably can run DeepSeek v4 pro right? Well whenever it comes out if the weights haven't been released yet

[-]

Alternative_You3585@reddit

Bro 💀

Just run Kimi and be happy, tho I assume the speeds are gonna be slightly painful

[-]

sn2006gy@reddit

i pay 80 bucks for unlimited kimi basically.. that’s less than the electricity for those machines to be on

[-]

Kurcide@reddit (OP)

The entire system is 200Gbps node to node. Eventually I want to see if I can use these for prefill and cluster Mac Studios in for token gen after the new ones come out eventually

[-]

ceinewydd@reddit

NVIDIA wired this with PCIE 5.0x4 to the SoC so it’s 200G in terms of what links up to the switch but practically speaking hits 109Gbps and runs out of gas. Patrick from STH covered this in a video about clustering eight units together recently.

[-]

Kurcide@reddit (OP)

I confirmed on my current 8x Spark cluster. Single 200G cable per node, FS N8510 switch running RoCEv2 with PFC/ECN, MTU 9000.

The PCIe 5.0 x4 ceiling is real but NVIDIA did something weird with the wiring. Each physical QSFP port is fed by two separate PCIe x4 links that show up as twin logical RDMA devices in the OS (rocep1s0f1 and roceP2p1s0f1). So that ~111 Gbps cap is per x4 link, not per cable.

Saturate both x4 links across the single cable (NCCL_IB_HCA pointing at both twins) and you get ~199 Gbps through one physical port. NVIDIA basically split one 200G port across two PCIe x4 paths because they couldn't give it x8 lanes.

Per-flow workloads still cap at ~111 Gbps. Per-node aggregate gets to 92.5% of theoretical 200G if you use both twins. NCCL handles it transparently with NCCL_IB_HCA=rocep1s0f1,roceP2p1s0f1.

So the 200G is real, you just have to know how to actually extract it.

[-]

thehpcdude@reddit

Why not actual IB? RoCE is meh and introduces latency that you don't want. IB is dead simple.

[-]

ragingpanda@reddit

I think the connectX firmware for the dgx spark is hard locked to ethernet. Not sure if anyone has gotten around that (yet)

[-]

CommunicationOld9889@reddit

Clusters are so slow.

[-]

burger4d@reddit

Please post some performance numbers after you get everything setup, I’m very curious

[-]

ATK_DEC_SUS_REL@reddit

You’re gonna go far kid.

[-]

comp21@reddit

I thought the largest cluster you could make is eight? How is 16 going to work?

[-]

CreamPitiful4295@reddit

My amp goes to 11

[-]

DukeOfPringles@reddit

One problem, if you’re in America at least, you wall circuit will blow if about 12 of them run at a load of 120watts, so either you have two independent circuits near by each other (with nothing else going plugged in) and a REALLY long network cable to attach the routers. Or you own the home and got an electrician to do some rewiring. I can think of a lot better ways to spent 64k.

[-]

Kurcide@reddit (OP)

I own a home and had an electrician add a dedicated panel and 2 240 industrial outlets for the rack

[-]

DukeOfPringles@reddit

My level of jealousy is so dam high, I have to limit my setup to not trip he breaker.

[-]

codingafterthirty@reddit

I want to be DGX Sparks rich. And that is awesome. Would be interesting to compare large DGX cluster vs Mac Studio cluster. Lol, me, I am just rocking AGX Orin 64gb. Slow as hell, but get's the job done.

[-]

My_unknown@reddit

Try creating AI slop and post the videos to social media to buy more of them

[-]

My_unknown@reddit

Run a courier website and send them to me 😁😁

[-]

Wubbywub@reddit

a charity

[-]

Fancy-Restaurant-885@reddit

Jesus fucking Christ, just - how do people have so much money just burning a hole in their pocket?

[-]

MisticRain69@reddit

And here I thought the dudes making the dual rtx pro 6000 rigs were rich damn this dude makes the guys with those rigs look like us poors

[-]

Badger-Purple@reddit

Motherfucker also bought all the reaper sauce.

[-]

Kurcide@reddit (OP)

lmao that was 6 years ago man… I just wanted that delicious reaper sauce. It was the closest thing we had in like a decade to the volcano sauce

[-]

Ok_Warning2146@reddit

Would this setup be faster than a 1.5TB RAM + one RTX 6000 setup?

[-]

Kurcide@reddit (OP)

the 1.5tb of ram won’t help in that example. only the 6000 pro

the 6000 is faster than a spark but the sparks just have so much more unified memory

[-]

Yosanga@reddit

Donate

[-]

Ikkepop@reddit

a fucking hedge fund dude, you seem to be loaded

[-]

Full-Sense5308@reddit

This is no longer local llama 😂

[-]

thawizard@reddit

OP going full localDataCenter.

[-]

johnnyhonda@reddit

Why would you buy 16x DGX Sparks, and then go to reddit to ask people what to run on them?

[-]

Kurcide@reddit (OP)

for Karma?

[-]

thawizard@reddit

Can I have one when you’re done playing with them? 😅

[-]

Mythril_Zombie@reddit

You didn't buy them, how did you get them?

[-]

Kurcide@reddit (OP)

I definitely bought them

[-]

cddelgado@reddit

Can I have one? Rather, can my university have one?

[-]

Live-Possession-6726@reddit

Atlas - atlasinference.io

[-]

Porespellar@reddit

Why did you not opt for a GB300 DGX Station? They are out now from several vendors and I think are running about $90K

[-]

PrysmX@reddit

That's still "only" 768GB lmao.

[-]

Kutoru@reddit

Just for some clarity. The support for mixed bandwidth workloads is extremely poor (outside of .cpu()) and rightfully so as it is not worth the complication to support.

It is better to treat it as a 252GB HBM2E GPU and a 496 LPDDR5X GPU. Then there is also time sharing complications and have to be very careful to make sure the LPDDR5X data doesn't go through HBM2E before hitting the GPU - as you'd want a similar experience to DGX Spark.

[-]

PrysmX@reddit

Yep, I just didn't want to overcomplicate my response since 768 < 2048 anyway. 😄

[-]

MajorZesty@reddit

Hm, only numbers I've seen is closer to $150k, but they're all custom talk to sales stuff. Haven't seen anyone post actual quotes.

[-]

pheoxs@reddit

https://configurator.exxactcorp.com/configure/VWS-158270643

95k for 496gb of ram and 252gb of hbm3e

[-]

MajorZesty@reddit

Nice! Thanks for the link. It's cheaper than I thought it'd be. Not that it's in my budget lol

[-]

pheoxs@reddit

Have you considered cutting avocado toast out of your budget /s

[-]

Sad-Enthusiastic@reddit

Roblox

[-]

thefox828@reddit

Did you get a better price ordering so many?

[-]

Kurcide@reddit (OP)

yes, got them slightly below original retail. So saved like $550+ on every node

[-]

Blackdragon1400@reddit

I’m still mad losing $700 buying my 2 sparks a week apart after the price hike

[-]

DaMan123456@reddit

Whatever the hell you want! lmao :D

[-]

Allseeing_Argos@reddit

What should you run? You should run from me.

[-]

Smultar@reddit

I'd kill to get one of those, but cant afford em

[-]

Kinky_No_Bit@reddit

16..... 16.... @ how much a piece? $4,699.00 .... sooooo..... $$$ 75,184 dollars.... O.o

[-]

somerussianbear@reddit

Run back to the shop to return this crap.

[-]

epSos-DE@reddit

Gemma 4 IS GOOD !

Kimmi is good !

The online version of Kimi is better than Claude , because it reasons better, BUT fanboys going to hate if you say it !

[-]

prince_pringle@reddit

Serve chessagents matches! I just finished a rust/gpu chess engine for the spark

[-]

emteedub@reddit

Chain 14 together, then send 2 my way

[-]

miltonthecat@reddit

First go watch this video from ServeTheHome which is the closest thing you’ll get to an instruction manual for a cluster of this size.

https://youtu.be/uYepcMoqvKQ?si=73k7DjTk-HqgPEON

[-]

SanDiegoDude@reddit

Dude I love my DGX, I develop on it constantly and it's rad... but it's ungodly slow. I could only imagine what trying to run a massive model that the 2TB would support when I get impatient just waiting on Qwen 27B to hurry tf up, lol. I'm jealous, but also please please please share what your actual t/s times are once you can run one of those open source monsters that are dropping out of China.

[-]

SpearHook@reddit

My hat’s off to you. I have one hooked up, working on my second. Do you need dedicated/special power hookup for that many rigs?

[-]

ArthurParkerhouse@reddit

lol, is this from spare pocket change, or a 2nd mortgage?

[-]

Ok_Campaign6438@reddit

Doom in 4d

[-]

kyr0x0@reddit

AFAIK you can only pair 2?

[-]

Kurcide@reddit (OP)

with a switch you can pair as many as you want

[-]

Embarrassed-Rip-3205@reddit

Bro, reading your posts... did you get rich from dogecoin?

[-]

Kurcide@reddit (OP)

lmao did you go back 8 years in my posts?

[-]

markstar99@reddit

At this point you can train AGI on your own

[-]

deepsky88@reddit

my mom

[-]

LavenderDay3544@reddit

Doom

[-]

jaysin144@reddit

Extension cords and air conditioning.....

[-]

LifeguardPuzzled3212@reddit

yourself outside to touch some grass

[-]

InfiniteClick@reddit

Shouldn’t you have been asking that question… before ?

[-]

firest3rm6@reddit

Minecraft Server

[-]

TheyCallMeDozer@reddit

You should run one over to me in the post lol ... isn't that like $80,000+

[-]

utf16@reddit

Will it run Crysis at full 4k?

[-]

Kurcide@reddit (OP)

I need 16 more sparks for that

[-]

thebloodreaper6739@reddit

im curious, what does your work look like to make use of hardware at this scale ?

[-]

philmarcracken@reddit

Should be all setup by tomorrow afternoon

The most rich phrasing ever. None of the rich ever do anything manual themselves lol

[-]

Kurcide@reddit (OP)

I’m literally crawling behind the rack and doing it myself. No fun if someone else did it

[-]

oftenyes@reddit

I thought you could only connect two sparks formally and three informally. Is that not true anymore?

[-]

Kurcide@reddit (OP)

nope, with a switch you can connect as many as you want

[-]

Foreign_Aid@reddit

Inversion: The Shortest Path to Disaster The most direct way to burn $100,000 with zero usable results is assuming this setup will function like an enterprise data center. Here is exactly why trying to run a massive 1T model across this cluster for real-time chat will systematically fail: The Communication Bottleneck: Professional nodes use NVLink, offering speeds around 900 GB/s. This cluster communicates over copper ethernet cables at 200 Gbps (yielding roughly 25 GB/s of actual throughput). If you shard the weights of a massive model across 16 nodes, transferring activations over the network will take significantly longer than the compute itself. The system will technically work, but latency will render it practically useless. Compounding Second-Order Costs: 16 compute nodes running 24/7 plus a high-throughput switch will generate a continuous multi-kilowatt power draw. This will rapidly max out your residential electrical infrastructure and mandate an immediate, expensive, and loud dedicated cooling setup, completely defeating the purpose of a "home" lab.

[-]

Powerful_Ad8150@reddit

Nah, maybe he simply lives in Europe? 16x150W=2400W, lots of spare wattage for other stuff. And we still talk about single socket, single 16amp fuse while typical new connections where I live are like 20kW

[-]

Serprotease@reddit

You’re replying to an AI comment with messed up markdown formatting. And it’s quite wrong/out of date. NVlink is not really a thing anymore and 200gb/s is enough for 1T model with 8 nodes clusters per other users experiences.

[-]

Mother-Agent7445@reddit

So why 16 sparks? Does this make the gpu bandwidth equivalent to like perfect inference response for lots of people?

[-]

Due-Opportunity6212@reddit

Question, I am a beginner, how the hell would this be clustered? Also, is the latency terrible?

And lastly, I wanna do that too, is it good?

[-]

Due-Opportunity6212@reddit

Will there be a video too? I wanna see it badly if it works.

[-]

Techngro@reddit

"You're a rich girl, and it's gone too far cause you know it don't matter anyway..."

[-]

onewheeldoin200@reddit

Jesus dude 😂

[-]

FusionCow@reddit

This is kinda ridiculous, I mean honestly the only models TO run are kimi k2.6 and deepseek v4 pro

[-]

patricious@reddit

You just called us poor in 16 ways.

[-]

TheWhiteKnight@reddit

if you want to feel poor go here -> https://www.reddit.com/r/Salary

[-]

Firewormworks@reddit

Wow, that did make me feel poor... Should have been a dentist.

[-]

More-Curious816@reddit

Most of the posts there are fake, like 250k? 400k? 700k?

[-]

TheWhiteKnight@reddit

FAANG salaries can be nuts, for example

[-]

Darkoplax@reddit

We really need /r/poorlocalllama

[-]

Minipiman@reddit

Doom

[-]

HoldAdministrative85@reddit

Money run money

[-]

poopsinshoe@reddit

Minesweeper

[-]

Select-Dirt@reddit

You can now goon at the speed of light! Congrats, you made it

[-]

iam-not-a-monkey@reddit

Doom perhaps?

[-]

mistrjirka@reddit

tell me this is a ragebair XD how the fuck can you buy 16x dgx sparks and not know what to do with them. Like deepseek v4 or GLM 5

[-]

Legitimate-Pumpkin@reddit

I’d go for GLM AND DS4… AND M2.5 and Kimi 😂

[-]

TheDiamondSquidy@reddit

Money i’ll never get to enjoy

[-]

Fluffywings@reddit

A giveaway for everyone in this post!

All jokes aside the biggest open source model that fits.

[-]

Torodaddy@reddit

Oh you rich rich

[-]

dr_hamilton@reddit

You should run... a giveaway competition for folks here 😁

[-]

IrisColt@reddit

oops, deleted

[-]

AdventurousVast6510@reddit

all that money spent just to talk to an ai girlfriend faster

[-]

Kurcide@reddit (OP)

My AI girlfriend will be so smart thgh

[-]

IrisColt@reddit

heh

[-]

darkscreener@reddit

A simulation of the universe

[-]

villefilho@reddit

Minecraft, recommended settings

[-]

Toto_nemisis@reddit

Doom, that's what I would run

[-]

Adorable_Weakness_39@reddit

Space Cadet Pinball

[-]

ElChupaNebrey@reddit

Why not to make a test of all?

[-]

TwofacedDisc@reddit

Doom

[-]

Kutoru@reddit

I'm confused about the reason anyone would actually even consider 16x DGX Spark cluster for individual use. The DGX Spark is more suitable for larger inferences but that's just relative to its own inference performance.

Even for say clustering workloads, you can verify everything you need to on a 2x system (there are far more issues that can happen but those generally lie outside of the model-land).

There's nothing particularly special about 400gbps? Sure you don't see it on a consumer board but 400gbps is ~50GB/s and PCIE 5x16 has ~64 GB/s. So you can just sacrifice a PCIE slot for a Mellanox adapter.

[-]

ycnz@reddit

An ebay auction?

[-]

Final-Frosting7742@reddit

Run deepseekv4 at 0.5 token/s!

[-]

leopold815@reddit

Crysis

[-]

lqstuart@reddit

crysis

[-]

I_EAT_THE_RICH@reddit

You should run to the local charity and make a donation

[-]

the3dwin@reddit

There are a lot of comments so perhaps someone already made the comment:

If possible Use half to inference the top 8 models to the hardware to run something like chatgimmy.ai where you offer the use of hardware over an API like OpenRouter.

Then I suppose the other half 8 put away 4 as backups, and the other 4 for everything and anything you can think of.

[-]

Heisenberg99_1_@reddit

Uncensored hentai models

[-]

skmagiik@reddit

Can you please run benchmarks bringing up various sizes in the cluster. I'm curious how much performance (tokens/sec) you get per dgx

[-]

drox63@reddit

Why go this route and not getting a full rack setup? I mean I know why I would want to do this… but what are you doing it?

Also could I have dibs on any units you will be decommissioning?

[-]

Kurcide@reddit (OP)

I have 8 a6000 ADA for sale that never got used

[-]

drox63@reddit

Can you dm me the link? Assuming the dgx spark is much cheaper on power and other infra to run, is that your theory?

[-]

Kurcide@reddit (OP)

https://a.co/d/0385KmCK

It’s slightly cheaper but tons more memory

[-]

Signal-Run7450@reddit

Run for your life, you might get attacked soon😂

[-]

uIDavailable@reddit

Idk op is asking this. They are cross posting in other subs with different titles and descriptions of the same exact post.

[-]

Kurcide@reddit (OP)

I posted one other time in homelab lol

[-]

FlyingDogCatcher@reddit

Can I come play at your house?

[-]

Kurcide@reddit (OP)

sure, come on down

[-]

forestryfowls@reddit

How does the high bandwidth networking work for this? Can you connect all 16 on one switch or do you need multiple? Can’t wait to see updates! Just saw serve the homes write up on 8 of these and that looked like a fun time.

[-]

Kurcide@reddit (OP)

a single 200Gbps switch with 16 ports is what you need. I have an FS with 24 ports

[-]

burnt1ce85@reddit

Holy smokes. How much did this cost you? Over $70k?

[-]

Kurcide@reddit (OP)

yes

[-]

Thistlemanizzle@reddit

Run eBay 's API connection sell all of it and switch to API tokens.

[-]

lannistersstark@reddit

You're going to run a very very large model at 10 tps?

[-]

Kurcide@reddit (OP)

yup, and eventually see if I can just use the entire cluster for prefill

[-]

YairHairNow@reddit

Jeez, can we be friends? It's like having a friend with a pinball machine collection.

But yeah, I'd go deepseek, Kimi, and check out Nvidias gaussian splatting/3d asset harvesting models. Doing benchmarks on NVFP4 would be cool.

[-]

Hour_Bit_5183@reddit

LOL how rich are people in here? My god. spending all this money to make ????????. This is either a troll or a really dumb person who somehow made money.

[-]

Kurcide@reddit (OP)

Just dumb I guess. Gunna go cry into my GPUs now

[-]

LankyGuitar6528@reddit

Have you seen Elon? Being really dumb seems to be a prerequisite to making money these days.

[-]

vulcan4d@reddit

Nvidia Ad

[-]

Cellsus@reddit

Tetris

[-]

Prince_ofRavens@reddit

If you don't already have the answer to that question and a backlog of a couple months of answer to that question I feel like you made the wrong choice lol

[-]

JuniorDeveloper73@reddit

The world...millons starving and very few...

[-]

realzequel@reddit

So none of us should buy luxury goods? Slippery slope.

[-]

theowlinspace@reddit

No, but it's important to acknowledge inequal wealth distribution throughout the world. If you can't directly do anything to change it, the least you can do is recognize your privilege.

While you buy luxury goods, others are much less fortunate and are struggling to make ends meet. While you can't change this because it's principally a result of capitalism, and even donation only delays the problem, understanding that your privilege is only the result of the plight of others instead of just blindly saying "Well, I can buy luxury goods, what do I care if people are starving" is much more moral

[-]

realzequel@reddit

Who's to say OP doesn't contribute a lot to charity? Want to change the world (or US at least)? Stop voting for the rich party (not you personally but people in general). Criticizing a reddit post is doing jackshit.

[-]

theowlinspace@reddit

Contributing to charity doesn't solve the root of the issue, and it's beside my point, I did mention that donation only tries to temporarily patch/delay the problem, not that it's a solution. Voting hasn't and won't ever change anything, the world will only change when the people organize.

I haven't criticized OP or any reddit post, all I'm saying is that recognizing the issue and hoping for a better future is better than trying to ignore it.

[-]

realzequel@reddit

So an armed revolution? Won’t happen in the age of entertainment. Too much bread and circus. Peaceful protests have done nothing as well. So your solution is unrealistic imho. The OPP was criticizing the post, not you.

[-]

MrHaxx1@reddit

Do starving eat people DGX Sparks, or what are you suggesting?

[-]

Kurcide@reddit (OP)

Just tried, they don’t taste good

[-]

Makers7886@reddit

[-]

DataPhreak@reddit

Oof.... bad deal. You could run A LOT of small models at a medium speed, or 3 kimi's at a snails pace.

[-]

Stunning_Habit_6411@reddit

Use it to generate an image with 32

[-]

DarkShadder@reddit

I am new to this sub, are people of this sub really this insane?

[-]

Anarchaotic@reddit

Outside of pure "big model!!!", I'm really curious to see how concurrency works. This is a use-case for a small team that wants to focus on local-first, and so I'd love to understand how 4-5 different users would be able to send concurrent requests, or even what the realistic cap is for work.

Let's say you have 4 devs working on a codebase at the same time, does something like this give enough headroom for them all to have stronger models all working in tandem?

[-]

Turbulent_Pin7635@reddit

Run a shop

[-]

celsowm@reddit

Doom 1993

[-]

Codename280@reddit

You should run to the bank and donate some money. Cause clearly you're spending it without purpose.

[-]

Bozhark@reddit

Agentic subagents that operate individual agents per agentic service

[-]

PiratedComputer@reddit

Run Doom

[-]

nopanolator@reddit

You have the hammers, who care about nails lmao this fcked world now

[-]

gobblegoooblegobble@reddit

DMing you

[-]

LankyGuitar6528@reddit

Local instance of Mythos? Take over the world maybe?

[-]

aka_blindhunter@reddit

You run to me need the address

[-]

chensium@reddit

lol what!

[-]

typical-predditor@reddit

You should run one over to me.

[-]

_p00@reddit

Would love to run Pac-Man 256 on it. Should go fast.

[-]

BrianJThomas@reddit

Sometimes I'm tempted to do something like this. I'd probably have to pull power off of the dryer outlet in my 1br apartment. I wonder if anyone else is doing this...

[-]

RelationshipLong9092@reddit

It has to be GLM-5.1, at a total weight size of 1.51 TB.

You can fit Kimi K2.6 on just 8x Sparks, and other people have done so before. Boring!

But I've never seen anyone set up a 16x cluster, so you'd be the first (I've seen) to run GLM 5.1 locally on "consumer" hardware.

[-]

Beginning_Bed_9059@reddit

An electrical grid

[-]

Glazedoats@reddit

GLM 5 AND the new Deepseek. 🤭

[-]

AsiancookBob@reddit

You probably answered this somewhere in this thread but I thought you can only link up to two nodes per Nvidia docs? Unless they scaled it up.

[-]

TheRiddler79@reddit

If this is true, this guy just realized a nightmare

[-]

TheRiddler79@reddit

Mimo 2.5

[-]

Cless_Aurion@reddit

I've never seen such a WRONG comment section.

The answer is DOOM.

[-]

brenden77@reddit

You're wrong. It's FarCry.

[-]

CyborgBob1977@reddit

Looks like anything you damn well want...

[-]

shadowmage666@reddit

See if crysis works

[-]

Familiar-Virus5257@reddit

I laughed way too hard at this bc I am too old. I remember the days of "but can it run Crysis?"

[-]

BakeMajestic7348@reddit

Bro is older than 13

[-]

HIGH_PRESSURE_TOILET@reddit

It actually probably does tbh. There's a list of some popular games (though not Crysis) with approximate fps figures on the DGX Spark in the recent steam arm64 snap thread: https://discourse.ubuntu.com/t/call-for-testing-steam-snap-for-arm64/74719

[-]

Mochila-Mochila@reddit

Bro, you're crazy.

That's all and have fun 💪

[-]

tauronus77@reddit

Can it run Crysis?

[-]

Ok-Hotel-8551@reddit

Minecraft

[-]

redilaify@reddit

the hell do you do for a job

[-]

ACIeverName@reddit

Run the latest deepseek v4-pro-max

[-]

ChiptuneXT@reddit

Qwen 3.6 plus :) so in reality awesome to see multi agent with top tier models like Kimi

[-]

DeadLolipop@reddit

Surely for the price of these, you could get a big fuck off nvidia server

[-]

akumaburn@reddit

This may actually be a better use case for Node-level MoE (different model per spark and a router that decides to send to which model depending on the request).

You could also agent swarm smaller models as well.

[-]

MajorZesty@reddit

Did you compare purchasing this vs a DGX Station? Ofc, thinking about it this is probably still 3/4ths the cost depending on the switch.

[-]

SnottyMichiganCat@reddit

Yea. At this scale I think that would make more sense.

But, maybe weird circumstances made these available. 😅

[-]

Freonr2@reddit

"Funny story, there was a truck overturned on the side of the road..."

[-]

wazabee@reddit

Crysis?

[-]

SarcasticOP@reddit

This is the what they would steal off a truck if The Fast and the Furious was made today.

[-]

_BigBackClock@reddit

madlad

[-]

NBelal@reddit

Minecraft of course

[-]

eecchhee@reddit

World of Warcraft Classic

[-]

Rude_Ambassador_6270@reddit

run for your life!

[-]

sultan_papagani@reddit

any chance youre looking to adopt a fully grown adult?

[-]

bartskol@reddit

You should run charity and donate one to me.

[-]

cauchy2k@reddit

i would use them to watch youtube and netflix

[-]

marutthemighty@reddit

Are you starting a video game company? Or are you building a new AI company?

[-]

Sanity_N0t_Included@reddit

What should you run? Apparently a payday loan operation since you have the big bucks. 🤣

[-]

duhballs2@reddit

to my house with one of them.

[-]

Ok_Technology_5962@reddit

Deepseek v4. Show us how its done

[-]

SGAShepp@reddit

From your wife

[-]

Historical-Internal3@reddit

Think I would of went with a DGX station - just around $20k shy (lol).

Especially if you were considering adding Mac Ultra's to the cluster.

Will easily pull 3-4k watts with this

[-]

geldonyetich@reddit

Same question of what a two ton gorilla's to do: whatever you want.

Highest scoring large open model weights on artificial analysis are probably your best bet. Check weekly, it'll change.

Granted this being an Nvidia kit you might want to keep the latest Nemotron loaded. Sometimes the biggest flex is what you can do with the least effort.

[-]

createthiscom@reddit

How the hell would you power that

[-]

trueimage@reddit

Hi can you send one over? DM for my mailing address.

[-]

spencer_kw@reddit

run a routing benchmark. put 5 models on it, same prompts, compare quality and speed across task types. that's the data nobody publishes and it's worth more than any leaderboard. tools like openrouter and routers like herma let you A/B test models against each other on real workloads, that's where the interesting numbers come from.

[-]

Accomplished_Steak14@reddit

OP can feed the whole africa with compute but choose to run open clown at home

[-]

bromatofiel@reddit

But does it run Doom?

[-]

jd52wtf@reddit

Sell them all and buy a B300.

[-]

g_rich@reddit

OP dropped some cash on these, but a DGX-B300 is upwards of a half a million; which would be well over 5x what OP invested here.

[-]

jd52wtf@reddit

Thank you for that analysis.

[-]

Fit-Palpitation-7427@reddit

B300 is not 2tb of vram

[-]

eallim@reddit

Can you let 50 instances of Nemotron 3 Nano Omni talk with each other?

[-]

moonrust-app@reddit

qwen 3.6 27b

[-]

Skylleur@reddit

Can I please have one :3

[-]

rallypat@reddit

Holy fuck I’m poor

[-]

arm2armreddit@reddit

If you already have an H100, just give it to some school kids or donate it to a university.

[-]

ayu-ya@reddit

You can send one to me, I'll call it my Sparky and take very good care of it-

[-]

Racer4711@reddit

a truck.

[-]

Stike_1@reddit

What should you run - run your Lamborghini, I guess :)

[-]

7657786425658907653@reddit

dude your ai girlfriend must be so quick at tokens

[-]

More-Curious816@reddit

I, also, want this guy AI girlfriend.

[-]

Helicopter-Mission@reddit

I hope you’ve not spent all this for inference only

[-]

Healthy-Nebula-3603@reddit

For llms those are useless.

[-]

Antique_Juggernaut_7@reddit

What an awesome project. Congrats.

I imagine you know about all of this, but here goes just in case:

Just make sure you follow the discussions on Nvidia's dev forum on the Spark. There has been a ton of issues that Nvidia has left unresolved in the GB10; some of them even touch the consumer/workstation Blackwell product lines. The most important one is the most vexing for Nvidia, which is that NVFP4 is NOT natively supported, for a couple of reasons -- some of them software-related (I think these are mostly issues with CUTLASS at the moment), but some of them hardware-related (GB10 actually doesn't have 5th gen Tensor Cores and that causes problems). These have been going on for a year now and the community is definitely frustrated.

Having said that, I am a happy owner of the two Sparks I own. If your project involves a lot of input tokens and/or a lot of concurrent requests, then a Spark cluster is very hard to beat.

[-]

Dependent-Wonder1366@reddit

turn it into a gaming rig

[-]

skydiver19@reddit

How long did it take for shipping and where did you source them?

[-]

machyume@reddit

Hey, when are you going to buy the power plant for your rig?

[-]

dataexception@reddit

Ummm... Pretty much anything you want. ;)

[-]

Anime-Man-1432@reddit

Marathon 🏃

[-]

unintended_purposes@reddit

https://huggingface.co/poolside/Laguna-XS.2

[-]

stormy_waters83@reddit

You should run to the post office and mail me one. Please and thank you.

[-]

Comfortable-Tie2933@reddit

🤫🥶

[-]

somnamboola@reddit

you should run yourself into a safe neighborhood

[-]

Substantial-Tax406@reddit

WHAT DO YOU DO FOR LIVING ?!!

[-]

misha1350@reddit

Palantir perhaps

[-]

Ok-Internal9317@reddit

💀

[-]

Deep90@reddit

His uncle invented Pokemon

[-]

htownclyde@reddit

trust fund

[-]

sparkleboss@reddit

Doom

[-]

billy_booboo@reddit

Send a couple to me, please

[-]

UnbeliebteMeinung@reddit

This is beautiful

[-]

charliex2@reddit

should get the asus ones instead, they're $1k cheaper and just a smaller base drive .plus the thermals seem to be better my gold sparks run way hotter than the asus's.

[-]

HongPong@reddit

i would try to start a public business (or perhaps a cooperative depending) to offer local AI services to the people in the region

[-]

Training-Event3388@reddit

I’m just curious on the cost this is amazing

[-]

TOO_MUCH_BRAVERY@reddit

turbotax

[-]

JacketHistorical2321@reddit

Sell them all and save for M5 silicon.

[-]

Eden1506@reddit

At that point I would have bought 7x RTX 6000 pro instead tbh.

[-]

Kurcide@reddit (OP)

I have a 4x H100 NVL system and a GH200 in the same rack. The point of this was to build a scalable cluster that exceeded the united memory pool of an RTX or H100 system. In the end, this is the cheapest way to scale to 2TB of memory on in the Nvida ecosystem

[-]

MajorZesty@reddit

Still feels like you'll hit way too many network bottlenecks to effectively use it. That said, I hope you post your progress! I'd love to be proved wrong.

[-]

Possible-Pirate9097@reddit

jfc ok you win bro

[-]

Subject-Tea-5253@reddit

Can I get one, please?

[-]

bebackground471@reddit

ok, first of all, congratulations on the litter oh cute, healthy little bundles of joy. Second of all, gimme two. I will care for them as if they were my own.

[-]

dtdisapointingresult@reddit

I mean what is there to think about? You can easily run the largest local model, GLM 5.1, at BF16 if you want (but obviously, do it at FP8).

Just try the biggest and baddest model from each top lab: Deepseek V4 Pro, GLM 5.1, Kimi K2.6. Qwen 3.5 397B is too small, I feel it would be a waste on your hardware.

[-]

Freonr2@reddit

I think you're going to be busy for a while trying to optimize that.

[-]

slindshady@reddit

Minesweeper

[-]

Foreign_Aid@reddit

With 2 TB of pooled memory, you have the physical capacity to load heavyweight models structurally equivalent to Gemini 1.5 Pro or early iterations of Gemini Ultra (as well as GPT-4 class architectures). Using 8-bit quantization (FP8), where one parameter equals 1 byte, you can deploy Mixture of Experts (MoE) models ranging from 1 to 1.5 Trillion parameters. You will still retain a massive memory buffer to handle an enormous context window (e.g., processing dozens of textbooks or huge code repositories simultaneously).

[-]

StardockEngineer@reddit

You should run about 4 off to the post office and mail them to me.

[-]

TechieByChoice@reddit

A sale!?

[-]

Eugr@reddit

OP, I’m very curious how that would work. What switch are you going to use to connect all of them together? Please reach out to me in DM or on NVidia forums - we haven’t seen a 16 node cluster in the wild yet. Should still work fine with our community build: https://github.com/eugr/spark-vllm-docker

[-]

g_rich@reddit

That's like over $70k of hardware not including the switch or cables; DGX Spark's have their place but this certainly isn't it. For one you'll never be able to scale past the memory bandwidth bottleneck so you'll be stuck at 20-40 t/s; a Mac Studio cluster would give better performance for almost half the price. If you needed to stay within the Nvidia ecosystem then a workstation built around a handful of RTX Pro 5000's or 6000's with a Threadripper and a good amount of RAM along maybe a few DGX Sparks if you needed to anything with ConnectX might have been a better investment?

Out of curiosity why did you go this route?

On the upside you've made my investment in a Mac Studio, soon to be 2 Asus GX10's and 10 gig switch a lot more palatable.

[-]

jmakov@reddit

Deepseek v4 Pro, cheaper and with less timeouts than Ollama Cloud. Looking FW to letting us try your new cloud 😁

[-]

codeninja@reddit

To my house with 2 of them.

[-]

SomeIngenuity1957@reddit

You should use it to play Minecraft

[-]

KingMitsubishi@reddit

Doom

[-]

Reasonable-Waltz7016@reddit

Double it and give it to the next person

[-]

staatsclaas@reddit

This has to be bait.

[-]

Dapper_Chance_2484@reddit

Why?

[-]

seanliam2k@reddit

What are you trying to achieve/do with this?

[-]

SerejoGuy@reddit

https://i.redd.it/xv62jte9f5yg1.gif

[-]

Irrealist@reddit

A giveaway.

[-]

jamesrggg@reddit

You should run towards some bitches

(nah im just playing, happy for you)

[-]

SirBardBarston@reddit

What is your use case?

[-]

kassandrrra@reddit

kimi k2

[-]

OleCuvee@reddit

a nuclear power plant 😀

[-]

qodeninja@reddit

nice man whats the plan with all this?

[-]

PrysmX@reddit

But will it run Doom?

[-]

jhenryscott@reddit

Minecraft!

[-]

cr0wburn@reddit

Doom

[-]

Pinzasca@reddit

This!

[-]

Luke2642@reddit

Just out of interest, why did you choose this? What was your economics calculation?

If I had the money, it'd be for https://tinygrad.org/#tinybox

[-]

Status-Secret-4292@reddit

I have to ask.

How much did this run you?

What do you actually do with LLMs?

What do you do for a living?

[-]

legatinho@reddit

Backstory?

[-]

astronomikal@reddit

16x of the best models models swarming research

[-]

Snoo_81913@reddit

Whatever the hell you want LMAO wut. How the hell did you get 16x sparks? What do you guys do?

[-]

Possible-Pirate9097@reddit

Has to be Jensen's secret blood boy.

[-]

outtokill7@reddit

Doom

[-]

kimmich_kim@reddit

Hehe all of deep seek v4

[-]

Conscious-Map6957@reddit

A circus

[-]

DarthCalumnious@reddit

Minecraft

[-]

thari_mad@reddit

power station

[-]

reto-wyss@reddit

I want to know how noisy the switch is.

I'd only need a 100Gb switch and I'm wondering whether there are some that are not vacuum cleaner level. I've simply been rolling direct connection with dual port 100g cards but of course that limits things to three systems.

Although, if I remember correctly that may be a self imposed restrictions to keep a certain level of sanity.

[-]

Kurcide@reddit (OP)

way too loud to be out in the open. I have a custom soundproof rack and built an exhaust system to pump hot air outside the house.

[-]

reto-wyss@reddit

Let's just say the builders have been summoned to come an drill a hole already... But I'd like to avoid doing any serious sound proofing.

Did you have any issues with humidity? Dust filter on room intake?

[-]

Kurcide@reddit (OP)

Definitely need to clean the server regularly. Dust is always an issue. I have a 15k BTU unit cooling the room with a sealed exhaust to pump hot air out. I haven’t put a filter over the intake yet but I just keep it all clean and it’s been fine

[-]

halcyonhal@reddit

Would love more details on what you did with the rack and the exhaust.

[-]

ComplexType568@reddit

I don't think this is a "home" lab anymore... Though, what stopped you from assembling a few RTX pro 6000s? I feel like those would be easier to handle and are stronger?

Also try K2.6, GLM 5.1 and MiMo V2.5 pro... And maybe DeepSeek V4 Pro when it stabilizes. Qwen and MiniMax are probably too small for you

[-]

Kurcide@reddit (OP)

all about unified memory. I already have a 4x H100 NVL system in the rack. This is actually the easiest and cheapest way to get to 2TB unified memory short of buying a B300 server for $600k

[-]

ComplexType568@reddit

Interesting, always thought M3U 512GB Macs were. Well, have fun with it! I'm still pretty concerned if them all working in parallel will be super slow though since they're all spread out.

[-]

NetZeroSun@reddit

At some point we are going to have a bunch of techies and nerds sitting on a bed of DGX, NVME, or storage and flashing victory “gang” signs while looking all “you mad bro”, compared to rappers sitting on piles of cash.

[-]

linumax@reddit

Can it run crysis ?

[-]

abnormal_human@reddit

You can tell NVDA is at an all time high this week.

[-]

ajw2285@reddit

Crysis

[-]

ClassicalPomegranate@reddit

Google Chrome!!

[-]

Mugen0815@reddit

Start a github-copilot-replacement. We need one.

[-]

ResidentPositive4122@reddit

Read this article the other day, you should give it a brief look-over, might find some interesting things in it. They did 8x but most of the stuff was pretty interesting (especially the pre-setup, and what snags they hit along the way): https://www.servethehome.com/big-cluster-little-power-the-8x-nvidia-gb10-cluster-marvell-cisco-ubiquiti-qnap-arm/

[-]