Behold! Probably the most ghetto local AI server:
Posted by MackThax@reddit | LocalLLaMA | View on Reddit | 257 comments
AKA: Jank Incarnate
After months of pain, I finally got a working setup.
There's a bunch of quirks about running a multi-Tesla setup. I was planning to write something about my experience after I get it running.
Currently, the fans are plugged into the wall, speed is controlled with a knob. I still gotta wire up a PWM controller for them.
Sea_sociate@reddit
Doesn't look half bad
vir_db@reddit
Please... 😂
Ok-Internal9317@reddit
MackThax@reddit (OP)
you win
ahtolllka@reddit
Nah, he don’t have fire hazardous plastic right on backplate, you still a king 😬 Be careful, really. I didn’t put much in it until on a 48h run my 8x3090 said that’s throttling time
Fresh-Letterhead986@reddit
i second that. you win. "i cant be bothered to have a case, or even a table, + fire trap P100. I'm not concerned that the pcie mounting bracket is cantilevering the 75w PCIe power pins partially out, because they're pushing the card up, because there's nothing under the motherboard -- that probably won't cause a fire by reducing surface contact area used for 5-6 amps power delivery.". An SD card, which belongs nowhere near such a system, and a chineseium power supply that has scary low numbers 450w? and zero english. But, I guess it says "ai" on it.
just when i was missing the heady days of r/buttcoin, you've brought me right back. This is glorious. Consider adding a 2nd P100. thank you.
Terrible-Ad-6794@reddit
No! This is a masterpiece!
UltraFOV@reddit
Very cool. You will beed to Increase that ram though
MackThax@reddit (OP)
why?
UltraFOV@reddit
16GB of ram, you are right to ask. If you keep all of your models with vram you will be ok. Having decent amount of ram help offload when running a model that cant fit well enough in the ram
MackThax@reddit (OP)
Well, that's why I went for lots of VRAM
UltraFOV@reddit
I have 256GB of Vram and I still spill over into ram. But I take you are not running frontier model
FullstackSensei@reddit
Very curious to hear what issues you had. By any chance, was it the motherboard crapping itself because of the multiple GPUs?
MackThax@reddit (OP)
Oh, you know it.
FullstackSensei@reddit
It's a very recurring theme on this sub. So many consumer boards crap themselves out with multiple GPUs. Personally, I only use server boards and never had an issue, even with eight 24GB GPUs on a single board.
sowerandreaper@reddit
People never think to check the PCIe lane support of motherboards and CPUs!
FullstackSensei@reddit
More important than lanes is the BIOS ability to support multiple GPUs. But you're right. I find it strange that nobody looks at workstation or server platforms. A PCIe Gen 5 platform is useless if you're going to neuter each GPU to 4 lanes and then downgrade to Gen 4 because of link stability issues. Might as well go with Gen 3 and get 8 problem-free lanes. At least you'll get 40 or 48 lanes.
michaelsoft__binbows@reddit
These days I feel like a gen 4 PCIe switch offers great middle ground where you drop to gen 4 and can host 4x GPUs off a consumer platform with a PEX88096 to get x16 gen 4 from host to cards and x16 gen 4 peer to peer between cards. All you give up is dedicated x16 gen 4 to each card, which was never in the cards on a consumer rig.
I do feel it is more compelling now that RAM prices you well out of workstation and server platforms. If you have multiple Pro 6000s then it's not much of a big deal but let's say you are scaling 3090s, a single $300 PEX88096 can let you make a quad 3090 with basically zero compromises running off a potato CPU/mobo as far as i can tell here. Multi-NVMe (m.2, u.2) adapters that do not require bifurcation | ServeTheHome Forums
FullstackSensei@reddit
The issue with Gen 4 isn't just lanes. Until very recently, you could get epyc boards for less than the cost of that PLX card. Even now, you can find the occasional deal if you're a bit savvy.
The thing with Gen 4 is that it's a lot more sensitive to noise. So, your cables also have to be Gen 4 (more expensive) and your riser needs to be able to handle Gen 4. And all this for what? 3090s don't have p2p anyway. And we haven't even talked about how so many consumer boards crap themselves when more than two GPUs are connected, even ignoring lane count.
If you go with an Epyc, you get 128 lanes, many boards expose 90+ of those on PCIe slots. At X8 per GPU, that's 12 GPUs without sweat, while having ~16GB/s dedicated bandwidth from the CPU to each GPU. You'll absolutely need that without p2p because all communication has to go through the CPU (and the code running on CPU). With a PLX, bandwidth to the CPU quickly becomes a bottleneck.
You'll actually see worse performance using that PLX card, because all your GPUs will have to use that single x16 link to the CPU, whereas now you have 20 lanes. And you're lucky your motherboard didn't crap itself out already with three GPUs.
I have four 3090s in a Epyc system, and I can see 25GB/s per GPU running models across all four using ik_llama.cpp. That's 100GB/s (byte, not bit) flowing in and out all four during inference.
AFulminata@reddit
I'm fairly rough in the space, can you recommend what you mean by this? Maybe an example board of the gen3 vs the gen 5?
FullstackSensei@reddit
Your average AM5 has 24 Gen 5 lanes. A 10 year old workstation platform such as X299 has 48 total lanes, and actually more memory bandwidth than the latest Zen5 Ryzen (quad channel DDR4, up to 4266, vs dual channel DDR5).
If you go server, the 10 year old LGA3647 has 48 Gen 3 lanes and another 8 from the chipset for a pair of M.2 SSDs. It has six DDR4 memory channels at 2666, that when fully populated give you almost 2x the memory bandwidth of DDR5-5600. Most boards use a few lanes for things like built-in NIC and the like, but you still get 40 lanes on slots. That's 5 GPUs at X8 each. If you get a 20+ core CPU, you'll also be able to run 400B models at Q4 at 12-18t/s offloading to CPU, depending on which GPUs you have.
Overall-Animator9323@reddit
Gracias por esto. Esta idea que das, para instalar dos 3090, provoca demasiado ruido y calor para estar en la misma habitación que el usuario? El consumo eléctrico es alto?
FullstackSensei@reddit
Wrong on both accounts. I have an 8 GPU machine under my desk and even using tensor parallelism it consumes ~800W at the wall during inference adespite the paper specs saying 2500W.
Noise is completely dependent on how you engineer it. Said 8 GPU machine is quieter under load than a laptop.
Overall-Animator9323@reddit
Eso pinta bien! Tienes algún post donde hables de ese setup? Sería muy didáctico. Hace unas semanas que le doy vueltas al tema de usar una vieja workstation, pero la información que encuentro me habla de esas dudas que te planteaba.
MackThax@reddit (OP)
This needs to be a pinned post. It was near-impossible to find that info.
FullstackSensei@reddit
There are so many of these stories. I see at least a couple every month.
MackThax@reddit (OP)
Then a curse hid them from me.
techdevjp@reddit
You have to use Google to search reddit effectively.
Add
site:reddit.com/r/LocalLLaMAto limit searches to just this sub.WiseassWolfOfYoitsu@reddit
Yep, I was able to get it working on some special consumer boards, but the best move I ever did was getting a used Threadripper WRX80 board for the inference stuff.
FullstackSensei@reddit
Now imagine how much you'd have saved had you gone for an Epyc board instead of TR.
pengy99@reddit
Sir, ghetto would have way more zip ties, no case, just sitting on a wire shelf.
BroadAddendum7134@reddit
Please share how you run vLLM, the script parameters, i have the same setup and I try to use tensor parrallelism
MackThax@reddit (OP)
haven't gotten that far
Positive-Protection1@reddit
I can now tell my family they have no right to complain about how my rig looks.
Vicar_of_Wibbly@reddit
Wonderful! I love it. Allow me to buck the trend - my rig used to look quite similar to yours until (after extensive and careful planning) a Christmas holiday project led to this:
It's a cube with 4x rtx6000 pro workstation (two per side) oriented to blow outwards. The frame is made from 400mm 4040 aluminum extrusion and custom 3D-printed corners, brackets, LED matrix screen, etc. that I designed in Tinkercad. Runs AMD epyc with Silverstone AIO for cooling. Intake fans are Noctua Chromax black swap 200mm and 120mm, with 92mm for DRAM cooling.
More photos on my vibe-coded photo blog-type-thing here: https://blraaz.net (no trackers, cookies, ads, product placements or anything like that - just DIY AI server porn).
michaelsoft__binbows@reddit
What a truly sick rig. Optimizing the hot exhaust airflow is wonderful, and I'm a big fan of the aluminum extrusion, the whole thing is amazing. What do you run on it these days?
I guess if I had to suggest something, 3d printed corner pieces to connect the extrusions will see by far the greatest mechanical stress so personally I would have gone with chunky steel hardware there just to go for the proper level of overkill there but the idea for popping squash balls in for feet is amazing too.
Vicar_of_Wibbly@reddit
Thank you!
It’s running mostly MiniMax M2.7 FP8 for agentic work, which really flies. The comment regarding steel corners is apropos of the moment because I have on my bench a bunch of steel corner brackets and internal brackets I’ll be adding soon. They’re completely unnecessary for strength, it’s already ungodly strong, but once painted black they’ll help make it look pretty industrial as well as being indestructible.
MackThax@reddit (OP)
sick
KoalaCloaca@reddit
BroadAddendum7134@reddit
Please share how run vLLM, the script parameters, i have the same setup and I try to use tensor parrallelism
MackThax@reddit (OP)
bro managed his cables
LankyGuitar6528@reddit
It's beautiful! Beats the hell out of my 120TB RAID Array running on Microsoft Storage Spaces...
MackThax@reddit (OP)
this guy actually doing something useful with his AI
LankyGuitar6528@reddit
Yes! I'm going to point it at Robinhood. No need to even check my account... I'm too busy out shopping for Lambo's. Nothing can go wrong.
Nnazeroth@reddit
You are far way myfriend
LankyGuitar6528@reddit
Omg... that.. thing... if it gets up in the middle of the night and murders you I wouldn't be surprised. AGI confirmed.
Nnazeroth@reddit
IT has tried but my t100 stopped it
MackThax@reddit (OP)
RGB for ekstra tokens
Nnazeroth@reddit
You cannot buy anything without RBG today... I hate them!
dave-tay@reddit
Oooh llama porn... very nice
MackThax@reddit (OP)
Now, that's a new image in my head...
sagiroth@reddit
Well, with this setup you are well off for a video...
kaisurniwurer@reddit
Getting a mount on the side of the case actually seems like a quite neat way of fitting more GPUs.
Not only are the PCI adapters straight, but also there is a lot of free real estate there.
You technically could print holders for a transparent (or noise cancelling) sheets to cover it up and get some fans to redirect some air. Though jank is fine too.
Or make the holder clips slot into the case cover, though that is a lot more work with the cutting and such.
marktuk@reddit
What do you use your local AI for?
MackThax@reddit (OP)
I didn't plan that far.
sagiroth@reddit
This should be a headline of this sub.
marktuk@reddit
Fair! 😂
I got ollama running recently and while it's cool I can run models locally, I haven't found them particularly useful because the quality is so poor compared to commercial models.
MackThax@reddit (OP)
How big models?
marktuk@reddit
9B so far, only got 8GB VRAM unfortunately
twoiko@reddit
Try using an optimized MOE with CPU offloading
voltaire321123@reddit
9B isn't so bad as long as you're using it for agentic tasks and not relying on it's training data. I built a local web researcher chat agent that's based around Qwen3.5-9B. It rocks for that sort of stuff. It's not great for coding tasks though.
MackThax@reddit (OP)
Yeah, that's useless. 30B is where it starts getting interesting.
PigSlam@reddit
Can confirm. I just put a Radeon Pro AI R9700 32GB in my Fractal Ridge gaming PC (it replaced an RX 9070). I figure my Ridge must be one of the very few on the planet with that configuration, but my setup is rather pedestrian compared with your work of art.
marktuk@reddit
Yeah unfortunately I am just not willing to spend thousands on a GPU to run that, I could just upgrade my Claude plan 🤷♂️
MackThax@reddit (OP)
It's either $$$ for a pro GPU, sanity for a project like this or your soul to the corpo-feudalists. I just love living in the future.
Prof_ChaosGeography@reddit
Switch to llamacpp and maximize the quant size. You'll find they are a ton better and faster now as ollama is just a wrapper that trades speed and quality for ease of entry
Nyghtbynger@reddit
I don't see any cardboard
noo8-@reddit
I like it I like it alot Gives me fallout vibes ;)
Jolly_Criticism9190@reddit
wHY DoNt u jUsT BUy A bLAcKwELl RtX 6000 w/ 96gB VRAm? u oNLy NEeD oNe PcIe SLot
MackThax@reddit (OP)
I now have to convince myself this project and the money saved were worth my sanity,
Wallaby989@reddit
it is not the destination but the journey
WiseassWolfOfYoitsu@reddit
I sometimes pull up the page and long for one... but my MI100 system has more memory and cost less than half as much as the Blackwell, and that's WITH the memory and Threadripper
HOLYxFAMINE@reddit
How difficult has it been to run multiple MI100's? I've convinced my CFO to let me look into local llm hosting for our 500 person company (not 500 concurrent user).
WiseassWolfOfYoitsu@reddit
Make sure to get a set with a Infinity Fabric. It just works with an off the shelf rocm based llama.cpp server container now, as long as it's ROCm 7 or newer. It could be crashy before that.
I'm currently playing with 70B models, but planning on trying out GLM 4.7 - with enough system memory and MOE offloading it should work at something like Q4.
alexanderi96@reddit
hell yeah
Visual_Internal_6312@reddit
Ok wait, you have 96gb vram BUT use a HDD?
MackThax@reddit (OP)
don't need fast storage
Visual_Internal_6312@reddit
That’s a datacenter brain with a NAS from your uncle’s basement.
spammmmmmmmy@reddit
I am curious to know how you are all attaching these disassociated PCIe cards. Do you get something like a PCIe extender ribbon cable?
MackThax@reddit (OP)
yes
__JockY__@reddit
Sometimes you just need some jank in your life :)
MackThax@reddit (OP)
bruh
ExtremeAdventurous63@reddit
“The most ghetto local AI server”? Hold my beer, I have 4 BC250s on the way to build the scratchiest AI cluster I can think of right now! Still, great job. I love these kind of setups
tiddayes@reddit
Here is my ghetto rig with 128gb of vram
tiddayes@reddit
LeatherRub7248@reddit
hot tits... what are those 3 arcs? what arcs are they?
tiddayes@reddit
There are 4x arc pro b70’s . The 4th is using a riser since it interfered with the ssd
LeatherRub7248@reddit
very nice.. which mobo are you using ? and what PCIE lane bandwidth are all 4 running? they all just hook into the mobo?
MackThax@reddit (OP)
sick
jacobpederson@reddit
MackThax@reddit (OP)
what is going on here?
jacobpederson@reddit
Threw in the 5090 4090 and 3090 into the same rig. It is for a project that runs an LLM on the 3090, image gen on the 4090 and video gen on the 5090. Pulls 1600 watts from the wall :D
bruhhhhhhhhhhhh_h@reddit
It has a case.
Nice work though
Technical_Corner3553@reddit
Did you ever crypto mine? Getto miner rigs eat this for lunch.
old-mike@reddit
It looks fantastic! Wow!
Limp_Statistician529@reddit
Who needs a Mac Mini when you have this lmaooo
seasonedcynical@reddit
Funny how similar these 'rigs' can be :-)
1200watt PSU, Aliexpress X99 Mobo (because of the dual pciex16 slots) comes also with the same Xeon cpu.
But in my case two RTX3080 (modded)20GB picked up during my visit to china.
Very stable system i must say.
stopwwIII@reddit
I Was wondering about "ghetto" meaning and i dont love google translate , thanks man
BoxWoodVoid@reddit
So this is what your GF looks like? You could have covered her at least!
MackThax@reddit (OP)
She rambles a lot.
sourceholder@reddit
What are you using the HDD spinner for?
MackThax@reddit (OP)
I forgot to calculate that a very small number of 100GB models fit on an 230GB SSD.
tmvr@reddit
That SSD is probably the most infamous for sudden deaths, look for a spare while it works if you don't have one already and have a backup of the files from it you don't feel like re-downloading.
MackThax@reddit (OP)
I like a little danger in my life.
zeta_cartel_CFO@reddit
Don't knock if it works. Although I'm curious on the token stats. Can you share those?
MackThax@reddit (OP)
There are some numbers in another comment
tangosox@reddit
I'm really curious about the v100. What would be the comparison between a v100 and a 3090 for instance? I know it doesnt have bf16 support and some other things but I'm looking for an affordable way to run bigger models. I'm only running moe models on my 4070S, gemma4 26b is the fastest most coherent model I can run currently.
MackThax@reddit (OP)
More VRAM. Probably less performance, I don't know.
snipsuper415@reddit
Pfft you have a case
MackThax@reddit (OP)
it did have a date with an angle grinder though
MackThax@reddit (OP)
fair
feverdoingwork@reddit
what kinda performance are you getting out of this?
MackThax@reddit (OP)
there's another comment of mine with some numbers
SnowyOwl72@reddit
thats like 150+ watts idling GPUs. Beauty.
MackThax@reddit (OP)
Nah, each GPU uses around 30W idle. The more worrying thing are the 700W it uses when NOT idle.
FissionFusion@reddit
whats the little fan blowing into the GPU called?
MackThax@reddit (OP)
Little? You mean the 120mm blower fan? It's called "120mm Blower Fan" on Amazon.
corruptboomerang@reddit
Nah, not even, you've got at least 3 levels of jank up you can go through. What is it, the V100 or whatever it is that you plug into a PCIe adaptor.
MackThax@reddit (OP)
wat
corruptboomerang@reddit
Yeah, you can get a V100 16GB that uses like an SXM2 connector, and a PCIe adaptor... Duno if there's any that let you use 2x V100s and have them interconnect, I'm sure some smart weirdos are working on it though, sounds like a great way to get cheap AI at home.
A V100 16GB is about $100-200, so a pair and a PCIe card (assuming eventually a PCIe adaptor will allow interlinking) could be in effect a 32GB card for under $500...
MackThax@reddit (OP)
ok
ryfromoz@reddit
Cue cartman singing in the ghettoooo
thoquz@reddit
Nice 3D printed fan blower adapter, did you design it?
How are the other two GPU's cooled?
MackThax@reddit (OP)
Thank! Yep. I caved and bought a 3D printer for this mf after a while of failing.
There's another big fan in the case (that white round sticker) that blows into those two.
tklein422@reddit
Best way to learn right here my friend! Keep up the good work!
Square_Elderberry_66@reddit
I have some questions but it won’t let me post 🥲
Megneous@reddit
I run a dual 1060 6GB setup. I don't wanna hear anything from you lol
grabber4321@reddit
Do you get free tokens out of the end of that GPU? Yes? Ok then its worth it.
MackThax@reddit (OP)
Free tokens and 600-800W of free heating. 🥵
twoiko@reddit
"free"
grabber4321@reddit
WIN WIN!
Sofakingwetoddead@reddit
IDK man. I don't see junk, I see beauty! Awesome setup!
gdwallasign@reddit
just got a couple v100s in myself
cicoles@reddit
Ran out of space but need more POWER!
WithoutReason1729@reddit
Your post is getting popular and we just featured it on our Discord! Come check it out!
You've also been given a special flair for your contribution. We appreciate your post!
I am a bot and this action was performed automatically.
LA_rent_Aficionado@reddit
Yes, this is exactly the Jerry rig that make this sub great.
I also forgot what quiet sounds like
LankyGuitar6528@reddit
OMG... what the hell is that?! It's so hideous it wraps right back around to stunning. Love it.
LA_rent_Aficionado@reddit
It’s the source of my dreaded power bill and thank you haha
jafarykos@reddit
card #4 in the stack is gasping for air. This is amazing. I saw a 4 3090 water cooled build for sale on local FB marketplace for $5k and I regret being unable to grab it. It was so pretty, but also we have yours.
FearFactory2904@reddit
In the spirit of sharing jank ai builds...
FearFactory2904@reddit
smb3something@reddit
Damn, you just went to town with some snips lol
jafarykos@reddit
the Kool-aid man over here in GPU form.
FearFactory2904@reddit
Lol yeah man. That p40 was going to fit one way or another. Later had a second one dangling outside the case using one of those cryptominer pcie 1x extensions.
cleversmoke@reddit
This looks so early 2000s, I love it
MackThax@reddit (OP)
This looks like it runs Windows 98 haha. I can hear the HDD scratch and the modem screech.
jafarykos@reddit
Step 1: Cut a hole in the box Step 2: Put some GPUs in the box
..
My power cables stuck out too much to put the lid on my 4U case with my 5070TI, so when I bought a 3090 I figured I'd just make a bunk bed situation. All the processing bits are on the lower bunk and the top bunk has the cards. This lets me get a weee bit more airflow.
But I ended up with an 8U cube that's kinda fucking heavy.
pixelworld_ai@reddit
This is the way
ElderberryLow4127@reddit
That’s the shit I like
jazir55@reddit
Ghetto is when you breadboard the the entire mobo and then encase it inside a 3d printed toaster shell.
Royale_AJS@reddit
If it runs and gets the job done, it ain’t jank.
Lesser-than@reddit
the cursed 16gb laptop sodimm in an adapter is maybe the best jank I have seen yet :)
LankyGuitar6528@reddit
...damn... did not see that. Jank level 10 unlocked.
sammoga123@reddit
You're missing the water tank to leave your neighbors without water, lol.
And I'm joking, but in these cases you realize that that stupid anti-AI argument is, well, really stupid.
LankyGuitar6528@reddit
Oh no... I just got a 3090... I should probably order some water...
SillyLLM@reddit
Look at those fancy 3D printed brackets! You’re not even using zip ties. Way more professional than some bitcoin mining rigs out there!
Equal_Giraffe8866@reddit
one loves to see it
TonyStark1500@reddit
Wow! Jealous. I was thinking of ordering one of these V100’s on eBay. How good are these for local LLMs? I was thinking of getting one V100 32GB and run Qwen 3.6 27B on it? Or is that a terrible idea?
MackThax@reddit (OP)
I get 39T/s on Mistral small 24B, quant 4_M.
I really can't claim whether or not it's a good idea though.
TonyStark1500@reddit
Thanks for the info!!
Noob_Krusher3000@reddit
I love this build. Obviously, the VRAM is great, but how do 3 Teslas perform? I can imagine GGUFs being great for this.
MackThax@reddit (OP)
I gave some numbers in another comment.
riley_srt4@reddit
You could probably fit a smaller power supply in this system by under volting the GPUs. Just something to consider.
Wyldkard79@reddit
Hey, what are the blue things holding up the gpus? Is that a 3d print? I need something like that as I have some Radeon MI25s I need to stack sideways like that. Dig the setup, if it works it's not ghetto, it's a work in progress.
MackThax@reddit (OP)
Definitely still WIP. 3D prints, yes.
Consistent_Maize1915@reddit
Get one of those open minig racks for like $45 on ebay
MackThax@reddit (OP)
then I wouldn't get to use an angle grinder to make room for a fan
Infamous_Mud482@reddit
really hoping those mounting brackets are at least PETG. The sheen on the ducts makes me think it might be.
MackThax@reddit (OP)
is.
Total_Listen_4289@reddit
This is a work of art
Thebandroid@reddit
That’s what we’re calling ghetto?
I see that and raise you a 9070xt plugged into a z77 mono with an i7-3330.
Woof9000@reddit
I see no cardboard or duct-tape, you still have ways to go, to claim that title.
Upstairs_Tie_7855@reddit
__JockY__@reddit
Yaaaasssss
jld1532@reddit
This feels like you've crossed over into fire hazard territory
MackThax@reddit (OP)
hahahaha yea boi
Upstairs_Tie_7855@reddit
Not so bad at 15-20% speed, enough to keep them from throttling during inference
Mauer_Bluemchen@reddit
Beautiful!
kiwibonga@reddit
Nice. I have a mile of plastic tubing and 3 nvlink boards and 4 V100s somewhere between my house and China right now.
This is my porn.
SureTie253@reddit
Thats actually amazing :D i’m super noob about this but how do you use 3 gpus? I mean how can you add 32+32+32=96?
CalBearFan@reddit
llama.cpp works well with multiple GPUs, I have 4 3090s on a Supermicro server board and it works great across them giving a 96gb capability
MackThax@reddit (OP)
I just run a GGUF model with kobold.cpp and it just works. That is after installing the correct drivers and CUDA framework. And all the other shit I had to deal with.
kartblanch@reddit
I love the ghetto tech rise im seeing. Frankenstein machines to run llms are peak utilization of free will.
daMortarMerrier@reddit
Mine is almost that bad. He name is "Abomination". Picture to follow.
jzzlr@reddit
looks like old cryptomining rigs, before crypto got all filthy. love it!
not_a_db_admin@reddit
the laptop sodimm in an adapter is somehow worse than the fan knob, and i mean it as a compliment
panchovix@reddit
How are the temps with that fan on the V100? I want to try a similar one for an A40.
MackThax@reddit (OP)
It got up to 82°C with a long job in Pi, with the fans on the lowest setting. Then I waddled my ass over and turned the knob halfway up and it got down to 60°C. The fans are more than enough. One can cool two GPUs no problem. I wanted as big of a fan as possible for reasons of noise. I still have to make a PWM controller for the fans, but I'm confident it will be pretty quite when idle.
panchovix@reddit
Nice! How much power do these V100 use, 300W as well?
MackThax@reddit (OP)
Max. 250W by default.
FailBait-@reddit
Ah I must have misremembered. 250 and then capping to 200 with minimal impact then.
FailBait-@reddit
Not OP but they can do up to 350, but you can power cap to 300 with only a 3-4% hit to performance with minimal fuss. I’ve done the same in my R740xd
ambient_temp_xeno@reddit
People have made coffee table books about weirder things than everyone's jank AI builds.
StardockEngineer@reddit
Looks fine to me.
-Ellary-@reddit
OP: Hi Everybody! I'm Johnny Catswill and tonight we gonna run TheDrummer_Behemoth-X-123B-v2.1-GGUF Q8
Colecoman1982@reddit
Eh, you've got a regular mid-toqer case in there. I've seen much more ghetto setups like hanging the motherboard, GPUs, and power supply off if a wire rack with a box fan blowing on it for cooling...
riconec@reddit
Have you considered nvlink between them? I am wondering how it improves speed
MackThax@reddit (OP)
I might be stupid, but I don't see where to plug the bridge in, on these cards.
riconec@reddit
Hard to describe but opposite side from pcie connector, there is a pcb visible with slits, nvlink bridge goes in there
MackThax@reddit (OP)
ahhhhh! I can see it! The backplate covers it. I probably have to remove it to expose it.
riconec@reddit
I think it is totally fine with backplate, however I am not sure how hard to finds those bridges (especially for 3 cards) and how pricey they are. what kind of models are you running on them? I was thinking to get v100 32 with further expansion with nvlink with one more
MackThax@reddit (OP)
Haven't tested many models yet, but
https://www.reddit.com/r/LocalLLaMA/comments/1tpdt5m/comment/oo87bza/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
can999999999@reddit
Dude this is beautiful
GrapefruitMammoth626@reddit
Safe to say, we all appreciate this junk pile.
MackThax@reddit (OP)
I appreciate you <3
OwnerByDane@reddit
Does the RAM pose any issues? 16GB seems light
MackThax@reddit (OP)
Currently no, but oh boy did it. Capacity is not an issue, since everything is in VRAM.
smart4@reddit
All with just a 1200W PSU?
MackThax@reddit (OP)
Well, yeah, the GPUs are 250W and they don't all run 100% at the same time. Other components are negligible.
Madman20201987@reddit
What’s the adaptor your using for the DDR4 Ram ?
MackThax@reddit (OP)
No idea lol. It was a surprise. I bought a small used PC just because it had RAM in it and when I opened it up - hello!
Ariquitaun@reddit
Janky and redneck. Love it.
sowerandreaper@reddit
Very nice! Would be interested to see your writeup and benchmarks!
dumbappsignup@reddit
I approve
Buildthehomelab@reddit
This reminds me of
MackThax@reddit (OP)
Ooh, wait, I'll dig up an old pic.
Buildthehomelab@reddit
oof thanks for trying
z3n777@reddit
ghetto ai is the best kind of ai
olliec42069@reddit
Actually that brings up a thought... why are there no hood models? I need one to speak to me in 90s/00s ebonics.
onephn@reddit
Ghetto ai is OUR ai
FatheredPuma81@reddit
Surely a big cardboard box would be better?
philmarcracken@reddit
im jelly, I want at least one Tesla V100 to pair with my 12gb 3080ti. Do you get the blower as part of the deal on one?
MackThax@reddit (OP)
Depends on what you find in the dumpster alongside it. Mine came without the case bracket even. No power adapters, no fans, nothing.
isopropoflexx@reddit
Hey, it's only weird if it doesn't work!
isopropoflexx@reddit
What model are you using with this setup, and what kind of (real world) TPS performance are you seeing?
I currently have a few individual LLM servers on my local network, with the primary being a multi-RTX3090 build. It would really well but it's also fairly expensive. I've looked at the V100's many times as a possible option for another secondary/fringe LLM server to incorporate, but with it being built on an older architecture, from what I saw, options on what to run on them is fairly limited. So I'm very interested to hear more about your experience so far?
fuck_cis_shit@reddit
scavenging DDR4 from laptops, truly a sign of the times
zhambe@reddit
It's perfect. All you need is your cat to take a piss in it.
MackThax@reddit (OP)
haha After all the crap I had to deal with, I wouldn't be surprised if that happened even though I don't have a cat.
zhambe@reddit
I was so proud when I finally got my build all buttoned up... week later, some spider decided it was a perfect place to start building webs.
zipperlein@reddit
That's the Spirit. ;D
MackThax@reddit (OP)
hahahaha love it
onephn@reddit
cable management isnt rats nest how dare u call this jank
all jokes aside this is such a cool setup, the 3d printed mounting hardware makes this look sick af
MackThax@reddit (OP)
Thanks! I'm especially fond of the 3D printed air duct propping up the top fan.
DeltaSqueezer@reddit
You have 3D printed parts and metal struts! That's practically professional! Look at attemps from a couple of years back when GPUs were just balanced in a pile 😂
MackThax@reddit (OP)
Oh I ensure you, the balance of the parts is very precarious!
jarail@reddit
Nice build! We need more of this. Always inspiring how much people push forward the frontier of what can fit in a case.
MackThax@reddit (OP)
Fit? Inside? What do you mean "fit inside"?
jarail@reddit
Truly a mid-size tower.
LingonberryBorn2161@reddit
Interesting, same setup but with 5 P100's. But how is your setup not a noise farm? My Blower fans are only full speed or nothing. They fail at <11V, which means full speed always because they get even hot in idle mode..
MackThax@reddit (OP)
https://www.reddit.com/r/LocalLLaMA/comments/1tpdt5m/comment/oo828pi/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
lobopl@reddit
how many tokens/s you achieve?
MackThax@reddit (OP)
Mistral 3.5, q4_M 128B: 9.8T/s, Mistral small, q4_M 24B: 39T/s.
reto-wyss@reddit
That's a picture you can hear.
Littlepharaoh@reddit
Sounds like my tinnitus probably
Jetboy01@reddit
The only thing I'm hearing is the ai begging to be put out of its misery.
MackThax@reddit (OP)
Hahaha, it's actually *relatively* quiet. At least compared to a server blade. The fans are pretty big and need to barely spin when idle.
Fun_Assist7660@reddit
I think it’s pretty gangster
Bulky-Priority6824@reddit
damn one tipped iced tea away from a new build
MackThax@reddit (OP)
It's one sneeze away from falling over and bending important stuff.
grabber4321@reddit
Tell us whats the actual setup? Update the original post with specs!
MackThax@reddit (OP)
did do
HokkaidoNights@reddit
Ghettotech at it's finest - im kickoff seeing all these shiny rigs... this is the way!
Hedede@reddit
Why is the third V100 mounted outside? Is it because of the Delta blower? I find that a 120mm fan attached to the PCIe bracket is enough to cool multiple cards.
MackThax@reddit (OP)
Really? When used to 100%, they heat up a *lot*, quickly. I could try it.
There's another big fan inside the case, cooling those two cards. The third is up above because of packaging constraints. There was supposed to be a fourth one, but I'm having trouble with it still.
pot_sniffer@reddit
Ha this is unhinged, I love it
fizzy1242@reddit
damn that's outlandish, love it.
semangeIof@reddit
Beautiful