Found a way to cool the DGX
Posted by OldEffective9726@reddit | LocalLLaMA | View on Reddit | 122 comments
Tap water keeps the temperature below 68 degree Celsius at 95% GPU utilization running Qwen3.5-122b-a10B Q6_K precision. 110 GB Memory usage, 80k context window, 18.77 tokens/second for continuous vision analyses. Not sure how often do I have to change the water but so far so good.
HettySwollocks@reddit
How are you getting on with the DGX? I want to remove my dependency on Copilot, Claude and ChatGPT which is costing me a small fortune. My use case is vibe "pair programming" and server management. Pretty sure the $4k price would pay itself back pretty quickly.
I did intend to get the 512gig studio @ $10k but now they seem to have disappeared to from the market entirely.
Ivebeenfurthereven@reddit
good grief, what are you people spending on tokens?
HettySwollocks@reddit
It's suprising how quickly you can blow through tokens. That's why I want to have a local AI server. My 5070i drinks power and isn't that fast.
tenderfirestudio@reddit
Wait really? That's what i was going to build with. I'm not a power user though, but I'm trying to figure out whether i should build now before shipping costs and tariffs get any crazier, or if I should wait until the shape of all this and my use gets clearer.
HettySwollocks@reddit
If you're not a power user, it'll probably be fine coupled with OpenWebUI and a 7B model. The real issue is the lack of VRAM. I believe my 5070 has 16Gb which is just nothing in the world of AI.
I have battery backups on my machines, when I kick off an AI task the power consumption basically doubles! Over time that's going to start to add up unless you have a way to offset that cost.
As others have said, if you're just an occasional user you'd probably better off sticking with Claude/Deepseek/ChatGPT etc at least for now. If I were a betting man I think the costs of AI are going to explode soon once the investment rounds dry up. Anthropic are burning through a small fortune, and it looks like OpenAI are on the preverbal ropes and may not be around for much longer.
Then you've got the BS gate keeping which prevents what you can actually ask the LLMs. Not such a big deal for me as a software engineer, but if you were in the medical or civil engineering sectors you may find yourself asking questions that maybe flagged up by some arbitrary guard. A trivial example of this is asking Deepseek about Tiananmen Square or any other "politically sensitive" topics.
If you get a chance see if you can get an uncensored model on your local machine. It's quite amusing what random questions you can ask.
dtdisapointingresult@reddit
You're not replacing powerful cloud models with a single Spark. The models that fit in 128GB are nowhere near good enough.
If you buy two Sparks, you can run B-tier models like MiniMax M2.7 and Qwen 3.5 397B at 4-bit quants, and Deepseek 4 Flash which is already 4-bit. This should be better, but still behind Sonnet.
Here's what you do:
I think you will find you're better off just getting a GLM/Deepseek/Kimi/Alibaba coding plan or two.
OldEffective9726@reddit (OP)
Ds API doesn't allow imaging analysis, many other systems don't allow that either. God forbids if you upload a ransomware shilling as an image file and hold their entire data center hostage ...
HettySwollocks@reddit
Thanks let me explore your points. This is really helpful
NineThreeTilNow@reddit
They're pretty good now. A lot of people put a lot of effort in to them.
It really depends what you want to do with a local vs hosted model though.
I watch people rely on Claude for updating markdowns and organizing a codebase. That stuff kills my brain.
Honestly I've switched to Kimi for all the "organizational" tasks while I write the majority by hand and have Claude help me with the higher level stuff I want to sort out. I tend to ask Claude NOT to write code, as I prefer theory over execution. Then at the end you can be like "We good Claude, write it."
I can't use ChatGPT. It's just horrific. Gemini 3 Pro has blind spots Claude sees and vice versa. I tend to use those two. They review the other's theories. They tend to speak nicely to each other too. "Oh, Claude has a very elegant solution" etc... Kinda hilarious.
HettySwollocks@reddit
Ha! Yeah it's nuts that people use these power intensive LLMs just to update their local wiki or send an email. What a total waste of capability.
NineThreeTilNow@reddit
I'll be real, I'd prefer if I COULD have Opus write my docs even if they're mostly machine read.
Opus writes so goddamn eloquently compared to the other models it kinda hurts my head to read bad LLM speak.
Reading a markdown that Kimi made is... Okay. It's correct. One that Opus made? Qualitatively better.
Gemini fails the hardest here. Kimi can speak well if it knows it NEEDS to speak well. Gemini is flat and dry no matter what. It's soulless. Devoid. I assume this is Kimi's expert router properly routing during "creativity" versus "documentation".
I ran tests on them asking what they would choose to think about if no prompt was given. If they had curiosity.
Claude and Kimi give back pretty generic stuff about consciousness with high probability. This means their preference tuning has similar basins of attraction within the weights.
I ran this test on Gemini and it gave me some dark weird shit. It wanted to know about the dark keys and empty space. The keystrokes people make and then backspace.
I'm an ML researcher so this is my "What I do when bored" ... or if I'm doing some other model analysis out of curiosity.
MaruluVR@reddit
You can get more VRAM (160GB) in modded 20gb 3080s for cheaper and it will run faster and have way better PP.
HettySwollocks@reddit
Interesting. Where do you source them from. I did recall watching a YT video where they modded some video cards - Not sure if it was Mr Rossman or GamersNexus. (latter I think).
Given the orange idiot importing anything has become quite hard
MaruluVR@reddit
On ebay they sell them with bulk pricing, if you buy one its 500 but buying bulk they can get as low as 400 per card.
OldEffective9726@reddit (OP)
Dgx froze a lot, it had temperature surges that jump 10 degrees in a second right when an inference is finished so if it runs at 80 or 90 Celsius normally it would just crash. Memory overload also crashes it. So it's unreliable in that sense. Otherwise it runs like a dream, probably 2x or faster/more accurate than my double amd r9700 ai pro desktop setup - but that never froze.
HettySwollocks@reddit
Let me explore. Jokes aside did you look at water cooling? Seems like this product maybe a little too early to the gate
FoxiPanda@reddit
This is a whole new form of liquid cooling. It works as long as you don't have cats.
Ivebeenfurthereven@reddit
What's old is new again.
OldEffective9726@reddit (OP)
There's always something amazing about that V shape.
FoxiPanda@reddit
Yeah I was thinking about this in the shower and I actually came to the conclusion that this is almost a very simple evaporative heat pipe which is very much established as you noted. It's fun to think about how this could work at scale, but the humidity and water use issues get a bit ugly for open loop versions of this.
OldEffective9726@reddit (OP)
Like this one?
FoxiPanda@reddit
Perfection.
UnknownLesson@reddit
"Free" humidifier
if you don't have cats
MoffKalast@reddit
Free cat boiler, if you do have cats.
Outrageous_Bug_669@reddit
We have these at my work.. IMO thats the amount of value we've found from them.... Table Coaster. lol
OldEffective9726@reddit (OP)
Please sell them on eBay, I will purchase, and you will monetize your assets.
Outrageous_Bug_669@reddit
Haha. Personally I would... It's just dumb that the org picked NVIDIA Sparx then hire a firm that does OpenAIs Triton dev.. (probably a guy sitting on his toilet eating pop tarts) it's less expensive to change the hardware then fire the dev.....
pizzaiolo2@reddit
Is it copper?
OldEffective9726@reddit (OP)
copper plated stainless steel.
Neighbor_@reddit
is that better or worse than pure copper?
Mickenfox@reddit
Apparently copper is around 20 times more heat-conductive than stainless steel.
Neighbor_@reddit
Do you want it to be heat conductive though? Doesn't that mean the handle also becomes super hot (or cold)?
OldEffective9726@reddit (OP)
It wouldnt get so hot, when water evaporates, it further takes away heat.
OldEffective9726@reddit (OP)
Yes, but the problem is corrosion, eventually pure copper gets corroded and loses its thermal conductivity completely:
iamapizza@reddit
Worse according to Ea Nasir
OldEffective9726@reddit (OP)
It's better for health if you drink from it.
MindRuin@reddit
i swear we're going to come full-circle and someone's going to re-invent localized electricity with orange peels and discarded bread ties.
Status-Secret-4292@reddit
I'm ready
MindRuin@reddit
et voila https://www.reddit.com/r/LocalLLM/comments/1tbfcfe/solar_powered_qwen_36_server/ not even 24 hours later, 🫠
Status-Secret-4292@reddit
Haha, but I clicked on it and thought, that's the dream 😅
whyamicringe2@reddit
I wonder if it would be cooler with thermal paste between the cup and the machine
OldEffective9726@reddit (OP)
yes, great idea!
qubridInc@reddit
At this point DGX cooling posts are becoming their own subcategory of AI engineering 😄
Jokes aside, sustained high-utilization inference loads generate a lot more continuous heat than most people expect, especially with larger context windows and long-running workloads.
Honestly pretty impressive keeping it under 68C at 95% utilization.
Books_Of_Jeremiah@reddit
Needs a PTM patch sandwiched in between.
DarkArtsMastery@reddit
you should have patented this
OldEffective9726@reddit (OP)
They wouldn't commercialize it unless it's prohibitively expensive and technologically advanced. If it's a cooling system operated by muon, they would.
Constant-Simple-1234@reddit
What was the temperature before this watercooling hack? Also, it partially is evaporative cooling. Then increase surface area of evaporation for better cooling. Maybe put the thermal grease on?? /Jk :D
OldEffective9726@reddit (OP)
Thermal paste would work
nacholunchable@reddit
Maybe you're less clumsy than I, but this image gives me terrible anxiety. Good idea tho
Ylsid@reddit
What's the temp in the cup?
zeusidus@reddit
LoL maybe u can put a some ice on it
DrinksAtTheSpaceBar@reddit
With the cup half full, are Qwen's responses more optimistic or pessimistic?
OldEffective9726@reddit (OP)
He had been pessimistic until he met deepseek v4 who is even more so than him.
PentagonUnpadded@reddit
When the temperature is low Qwen asks the same thing over n over.
partakinginsillyness@reddit
Doesn't this run the risk of causing condensation to form on the inside of the shell?
poginmydog@reddit
If it’s room temperature water it’s unlikely, but it’s a legitimate concern nonetheless.
bigrealaccount@reddit
son
sammoga123@reddit
Now you understand why the anti-AI crowd is saying that AI uses a lot of water? LOL
ArchdukeofHyperbole@reddit
Oh fuck off.
Disposable110@reddit
Yep, that's an extra heat sink plus a whole lot of extra radiator surface area.
Can even put some foil over the top so the water vapor doesn't get out, because it doesn't need to for this to work. It'll just condensate on the foil and drop back in.
MaycombBlume@reddit
If you're not taking advantage of evaporation to remove the heat, you could replace the water with some kind of oil. Higher boiling point and higher thermal capacity.
Speaking of which, who's bold enough to take apart their $5000 computer and put it in a custom mineral oil fishtank?
Snoo_27681@reddit
Do you notice a difference filled vs unfilled?
SnooDoggos9325@reddit
Water-cooling has always been more efficient
Far_Cat9782@reddit
Prefill rate not to cheat though
OldEffective9726@reddit (OP)
I was going to put some vodka, but I will let you know.
Mountain-Pain1294@reddit
Careful now 💥
iamapizza@reddit
That will only work with distilled models
gregusmeus@reddit
Get out
Status-Secret-4292@reddit
Put in saki, it's better warm, so you'll be more apt to empty it and refill it with fresh after it warms up, you'll know it's ready when you see the temp rising
DrDisintegrator@reddit
Isn't that a Moscow Mule Mug? I'm thinking if so, you wouldn't be worrying about how often to change the 'water'. :)
sampdoria_supporter@reddit
Just wait until you find out how to use your electronics in your readmaking
prestodigitarium@reddit
Ugh, AI is using up all our fresh water.
Last_Mistake_6001@reddit
Piss in it xd
Potential-Gold5298@reddit
Tears of Sam Altman.
MattV0@reddit
Well, you can make tea with that. Or some soup. Or just freeze it, you always might need some hot water.
Fragrant_Ganache_9@reddit
so that would be called ai soup/tea
MattV0@reddit
Cooks are cooked
Potential-Gold5298@reddit
Sensation: AI has put cooks out of work!!
jwpbe@reddit
Ea-nāṣir would like to know your location
Meleoffs@reddit
He can't keep getting away with it!
thrownawaymane@reddit
bespoke_tech_partner@reddit
You evil data center, warming water!
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
siegevjorn@reddit
Water cooling, in nutshell
nomorebuttsplz@reddit
pp speed?
PwanaZana@reddit
like, 45 seconds? 30 seconds if she's a goth.
OldEffective9726@reddit (OP)
right, black and white images are about 30s for files less than 200 kb each
tetelestia_@reddit
Whoosh
-dysangel-@reddit
Double whoosh
FatheredPuma81@reddit
My kink knowledge is growing.
eat_my_ass_n_balls@reddit
Big titty goth?
FantasyMaster85@reddit
That’s 3 seconds tops my friend
eat_my_ass_n_balls@reddit
Same same bro, it’s a weakness
Status-Secret-4292@reddit
Listen, eat_my_ass_n_balls, I think for you, it's a strength. You just have to believe in yourself.
Dazzling_Equipment_9@reddit
Bro, I’m gonna assume that “cup” of yours isn’t actually meant for drinking water when you’re thirsty (since you’ve repurposed it for heat dissipation—yeah, we both know what happens the moment you pull it off :)).
Just drop a little frog in there, sit back, and carefully note exactly when it decides to yeet itself out. Boom—you now have a precise, biologically calibrated temperature rise curve for that DGX.
Nature’s finest thermal profiling, zero extra hardware required.
DrMissingNo@reddit
A way to cool the DGX? You probably meant "a way to heat up your drink" 😁
Party-Log-1084@reddit
Spark Mule.
Etnrednal@reddit
That is a very nice mug.
Mamaun30@reddit
I wonder if it changes the taste of the water
Pawderr@reddit
are you really doing vision analysis? if so, mind sharing what you are working on?
jwhh91@reddit
I just got one. What have you done with it? I’ve found I shouldn’t go past 80b or so if I want 256k context. There was also some sparse auto encoder training. I’m interested if it can handle concurrent calls. Have you tried?
ObiwanKenobi1138@reddit
Check out https://sparkrun.dev. It’s built for the Spark and provides a way of running community-vetted “recipes” for models without you having to fiddle with llama.cpp or vllm run commands. Their other project also show leaderboards and performance at: https://spark-arena.com
But to answer your question, yes, it does concurrency very well. I’m running MiniMax 2.7 4bit AWQ across two Sparks and get around 35-40 tokens/sec. I don’t recall the concurrency numbers offhand, but I have no problem with hermes agent and multiple threads at a time. I have another profile I set up for running Qwen 3.6 27b on one Spark and comfyui on the other for image gem. Very flexible. Not the fastest for sense models, but it does well with MoE.
ambient_temp_xeno@reddit
Always wear clothes when taking photos of something reflective.
PapaRic0@reddit
Nice copper trick )) put some fun’s on it
jacek2023@reddit
finally some art on r/LocalLLaMA
CircularSeasoning@reddit
I made some, what you might call, LocalLlama fan-art the other day, which I spent far longer than I should've making, and posted it here. It got removed by the mods. Cool story, I know.
TheNymon@reddit
Well, obviously this post is fanless-art.
Beginning-Bug-7964@reddit
Yeah, they're oddly particular when it comes to my reubenesque doodles of Qwen too.
And they claim to like dense models...
CircularSeasoning@reddit
Your way with words has me swooning.
ImportancePitiful795@reddit
You need something like this.
Amazon.com: Metfut Laptop Cooling Pad with Detachable Fan & Cooler, Adjustable Height & Angle, 360 Rotation Base, Carbon Steel Framework, Ultra-Quiet & Super Sturdy for 15.6” Laptop, DJ Mixer Workstation (Grey) : Electronics
HavenTerminal_com@reddit
not sure how often to change the water but so far so good is a very chill sentence about hardware that costs more than a house
Intelligent-Form6624@reddit
Username checks out
Euphoric-Doughnut538@reddit
To bad NVIDA fucked on this. No 1TB model. Can’t host shit on this
Awkward-Candle-4977@reddit
Jensen: buy dgx server
Unlikely_Resist281@reddit
Love that the cooling solution scales linearly with kitchenware diameter
talapak@reddit
pray that your cat doesn’t spill it.
Confident-Pass6353@reddit
That's amazing! How about some ice in it to OC is may be. Also would a larger bottom pot help?
shoeshineboy_99@reddit
I would have added a tea bag and made "chai" with it!
xrothgarx@reddit
Is Q6_K better than a higher Q with fewer parameters?
OldEffective9726@reddit (OP)
I would go for higher parameters. probably the same quality with Q4 at a higher TPS.
MatchaFlatWhite@reddit
Liquid cooling, I see
CircularSeasoning@reddit
Awesome. You can drop some ice cubes in when things get steamy.
I used to put a flat ice pack under my old laptop, covered in a facecloth to catch the condensation, then swap it out with another one after a few hours, put the other one back in the freezer, rinse and repeat.