Side Projects.
Posted by apollo_mg@reddit | LocalLLaMA | View on Reddit | 44 comments
Little something I put together to play with for larger contexts than my 9070xt.
8700k, dual P100's, 16gb DDR4, 32gb Optane, Samsung sata SSD. Nothing too fancy.
Anyone else do a recent build? How's it working out?
CatTwoYes@reddit
Dual older cards (P100/P40 class) really are the value sweet spot right now. 32GB+ VRAM for under $200 is wild. With MoE offloading you can run 27B models at usable speeds and it handles coding + tool calling fine. The only real downside is prompt processing β once context hits 32k+ you start feeling it. But for the price of a single mid-range gaming GPU you get a 24/7 inference box. Hard to argue with that math.
FullstackSensei@reddit
I'll see your twin P100s and raise you eight P40s (no risers).
FatheredPuma81@reddit
The lighting makes it look like you're using a 600W PSU lol.
FullstackSensei@reddit
With vanilla llama.cpp that would actually be more than enough to power it during inference. With ik_llama.cpp, it can get close to 1kw. So, the PSU never really breaks a sweat.
Dry_Yam_4597@reddit
Whoaa? How do you cool them?
FullstackSensei@reddit
You know how the internet is made of a series of pipes? You see that long tube on top of the cards? it connects to the interpipes and cools the cards for free!!!
Dry_Yam_4597@reddit
Thats a good roast.
FullstackSensei@reddit
Sorry! Didn't mean it as a roast.
If you haven't read my responses to the other comments, they're watercooled.
Dry_Yam_4597@reddit
I did indeed, what I meant to ask was which waterblock was used - but after a day of battling claude's regression in quality all that my mind could squirt was the comment above π Thank you for sharing the info!
apollo_mg@reddit (OP)
LOL. I feel like Gemini is about 10x dumber than it was 3 months ago.
Dry_Yam_4597@reddit
I suspect they all are. The vram shortage is affecting everyone and the average user wants cat pictures and vibe coded html, and they are great at that.
itssethc@reddit
Thereβs a water block attached
Dry_Yam_4597@reddit
What I meant to ask is what waterblock they used. π
Shot_Restaurant_5316@reddit
Where die you get a proper cooler for these cards?
FullstackSensei@reddit
Little known fact: P40 shares the same PCB design as the FE1080Ti and Titan Xp. You can use any waterblock for those to cool it with a only small modification required: either desolder the EPS connector and replace it with two 8-pin PCIe connectors (the holes are already there), or cut a 2x2cm square from the acrylic/POM cover of the block above the EPS connector.
The blocks I'm using here are all EK FC, a mix of 1080Ti and Titan Xp. For whatever reason, EK decided to put the ports on those two variants in different positions, hence why the weird bend you see in the bridge (I designed it, and had it 3D printed in resin).
apollo_mg@reddit (OP)
This is really sick man. Good job. I just realized tonight how much better Q6 is at tool calling. But then I had to lower my context. So.. yeah clearly I need more. Many more.
FullstackSensei@reddit
Wait until you try Unsloth's Q8_K_XL. Your brain will be blown away again.
Shot_Restaurant_5316@reddit
Thanks, where do I find these waterblocks? Already read about it in another thread, but did give up, because I've never used water cooling and have some doubts about long time stability. Can you share your experience?
FullstackSensei@reddit
Well, the 1080Ti is old. So you'll need to buy them used. I'm a big fan of watercoolong. Machined my own CPU block around the turn of the millennium, years before the first commercial products were available. Then spent more than 20 years without owning a desktop.
It's not hard, really. Watch a lot of YouTube builds. Ignore the shiny ones and focus on the low cost ones. Also watch videos about how real corrosion looks like and how discoloration from using colored coolant looks like, and videos about how to open and wash/clean blocks and how to wash/clean radiators.
As for the build: No hard tubing. I like PVC soft tubing (aquarium tubing) because it's very cheap and very easy to work with. The fittings I get from aliexpress. Bykski, Barrow, Freezemod, etc. Whatever is on promotion. Just make sure you get everything for 10/13mm tubing. For pump, I use genuine Xylem D5. They're hyper reliable. You can find them under a ton of names. The easiest tell-tale is made in EU/Hungary. I run them at 100% all the time, no pwm. I have a mish-mash of radiators, all bought used, but from the big brands like EK and alphacool. The older lines are very cheap because they don't look "cool" like the new ones. I've gotten several 360mm radiators for β¬25-30 a piece. The front radiator in that rig is a 480mm alphacool Monsta, 60 mm thick, and it cost me like β¬60 shipped, instead of the 200+ it costs new.
For coolant, I just use distilled and de-ionized water with a few drops Povidone-iodine as a biocide. Dirt cheap, very effective, and no risk of it clogging anything. My 3090 rig has been running with this for over a year. Zero issues. PVC tubing will always "ooze" water over time. I need to top the reservoir of that rig once a year or so.
The hardest part isn't making the loop water tight, that's pretty easy with PVC tubing and a smidge of vaseline/petroleum jelly on the o-rings of the blocks after cleaning. It's getting all the air out. That can take 2 hours on a complex loop, where you need to tilt and turn the rig around all axis and directions.
To test for leaks, I just hook the pump to another PSU (shorting the two pins to turn it on) and let the pump run on the loop for 24 hours. This trick is also great to get all the air out without risking damage to the system, since nothing else is on, and the PSU is outside the case. If the level in the reservoir hasn't gone down after 24 hours, it's good to go.
Yes, it's considerably more work to put together than air, but I'd argue the benefits are worth it. First, everything runs cooler and quieter. Second, if you choose your blocks carefully, you can achieve crazy densities without using any risers. Third, speaking of risers, those can bring their own headaches and work. Just look at that pic, eight GPUs on a single board, all with no risers in sight. And did I mention how compact and quiet those things can be?
apollo_mg@reddit (OP)
Touche.
jeffzyxx@reddit
NICEEEE I have a P100 + Ryzen 5800x + 64GB DDR4 rig right now and I've got a second P100 coming in tomorrow. it's surprisingly capable! (Tho I am having to use a RX580 8GB until I can get the second P100 installed... still, with minor CPU offloading, I can run q8_0 at 100k+ context with >25tok/sec generation speeds!)
OldEffective9726@reddit
Dual p100 is nice
bobaburger@reddit
Nice! Did you test? How was the speed?
apollo_mg@reddit (OP)
I'm still learning what these cards like, but the best speed I've had so far with very limited testing is Qwen 35b MoE MTP. During a codebase investigation by about 20k tokens it had gone up to like over 50 tps.
PopeSchlongPaulII@reddit
Iβve been looking at those for something similar. What models are you running?
apollo_mg@reddit (OP)
So far I've only loaded up a few fine-tunes of Qwen 3.6 27b and 35b MoE. I just finished up the cooling last night and haven't had much time to spend with it yet. I briefly went down the MTP rabbit hole for a while earlier and had good initial results. Needs more testing on my end.
apollo_mg@reddit (OP)
About 6 inches away from the front of the PC mesh front panel. Fans are 2x 24v GDSTime 120mm Blower fans from Amazon on a separate power supply. Can probably get away with 12v versions based how how ridiculous these are.
JC1DA@reddit
This is mine: 4x3090
apollo_mg@reddit (OP)
Sick. What mobo?
JC1DA@reddit
I'm using huananzhi h12d-8d with epyc cpu
Sofakingwetoddead@reddit
Did you make the fan adapters yourself or is there a source for them? Also, very cool!!! π
apollo_mg@reddit (OP)
I adapted someone else's design. I can send you the STL or upload it to Thingiverse if you want it. Temps are very good with the 120mm blower.
Sofakingwetoddead@reddit
Thank you! I really appreciate that but I haven't ventured into the 3d printing world. I was just curious but I may take you up on that in the future!
apollo_mg@reddit (OP)
No problem. If you or anyone else takes the plunge and need ducts I'd gladly print them for materials cost.
Sofakingwetoddead@reddit
Wow, thank you for the offer. That'd be incredible and of course I would compensate. Glad to know it's an option. Options are good. π TY
kwizzle@reddit
Are those fans very loud? Is it even possible to get fans that can cool those server cards that don't sound like jet engines?
apollo_mg@reddit (OP)
They are very noticeable, but not loud per se. I'll take a video in a few minutes.
No_Draft_8756@reddit
I would also add the 9070 XT. With 48 GB you can run really good models maybe up to 80B (quantized)
apollo_mg@reddit (OP)
I am going to keep that on it's own for faster performance tasks. Plus I still occasionally play games... well. I used to...
No_Draft_8756@reddit
Get that point. For me it is similar
itssethc@reddit
How much did they run you? Stacking decent older cards is something Iβm considering heavily instead of upgrading my single card.
apollo_mg@reddit (OP)
They are incredibly cheap right now. Got them both from the same eBay seller for $88 USD/each shipped.
hautdoge@reddit
Is it right next to the wall? Give it some space to breathe dude
apollo_mg@reddit (OP)
I did. I noticed that too when I took the pic.