Behold! Probably the most ghetto local AI server: | TheaterFire

Behold! Probably the most ghetto local AI server:

Posted by MackThax@reddit | LocalLLaMA | View on Reddit | 257 comments

Behold! Probably the most ghetto local AI server:

AKA: Jank Incarnate

After months of pain, I finally got a working setup.

There's a bunch of quirks about running a multi-Tesla setup. I was planning to write something about my experience after I get it running.

Currently, the fans are plugged into the wall, speed is controlled with a knob. I still gotta wire up a PWM controller for them.

[-]

Sea_sociate@reddit

Doesn't look half bad

[-]

vir_db@reddit

Please... 😂

[-]

Ok-Internal9317@reddit

[-]

MackThax@reddit (OP)

you win

[-]

ahtolllka@reddit

Nah, he don’t have fire hazardous plastic right on backplate, you still a king 😬 Be careful, really. I didn’t put much in it until on a 48h run my 8x3090 said that’s throttling time

[-]

Fresh-Letterhead986@reddit

i second that. you win. "i cant be bothered to have a case, or even a table, + fire trap P100. I'm not concerned that the pcie mounting bracket is cantilevering the 75w PCIe power pins partially out, because they're pushing the card up, because there's nothing under the motherboard -- that probably won't cause a fire by reducing surface contact area used for 5-6 amps power delivery.". An SD card, which belongs nowhere near such a system, and a chineseium power supply that has scary low numbers 450w? and zero english. But, I guess it says "ai" on it.

just when i was missing the heady days of r/buttcoin, you've brought me right back. This is glorious. Consider adding a 2nd P100. thank you.

[-]

Terrible-Ad-6794@reddit

No! This is a masterpiece!

[-]

UltraFOV@reddit

Very cool. You will beed to Increase that ram though

[-]

MackThax@reddit (OP)

why?

[-]

UltraFOV@reddit

16GB of ram, you are right to ask. If you keep all of your models with vram you will be ok. Having decent amount of ram help offload when running a model that cant fit well enough in the ram

[-]

MackThax@reddit (OP)

Well, that's why I went for lots of VRAM

[-]

UltraFOV@reddit

I have 256GB of Vram and I still spill over into ram. But I take you are not running frontier model

[-]

FullstackSensei@reddit

Very curious to hear what issues you had. By any chance, was it the motherboard crapping itself because of the multiple GPUs?

[-]

MackThax@reddit (OP)

Oh, you know it.

[-]

FullstackSensei@reddit

It's a very recurring theme on this sub. So many consumer boards crap themselves out with multiple GPUs. Personally, I only use server boards and never had an issue, even with eight 24GB GPUs on a single board.

[-]

sowerandreaper@reddit

People never think to check the PCIe lane support of motherboards and CPUs!

[-]

FullstackSensei@reddit

More important than lanes is the BIOS ability to support multiple GPUs. But you're right. I find it strange that nobody looks at workstation or server platforms. A PCIe Gen 5 platform is useless if you're going to neuter each GPU to 4 lanes and then downgrade to Gen 4 because of link stability issues. Might as well go with Gen 3 and get 8 problem-free lanes. At least you'll get 40 or 48 lanes.

[-]

michaelsoft__binbows@reddit

These days I feel like a gen 4 PCIe switch offers great middle ground where you drop to gen 4 and can host 4x GPUs off a consumer platform with a PEX88096 to get x16 gen 4 from host to cards and x16 gen 4 peer to peer between cards. All you give up is dedicated x16 gen 4 to each card, which was never in the cards on a consumer rig.

I do feel it is more compelling now that RAM prices you well out of workstation and server platforms. If you have multiple Pro 6000s then it's not much of a big deal but let's say you are scaling 3090s, a single $300 PEX88096 can let you make a quad 3090 with basically zero compromises running off a potato CPU/mobo as far as i can tell here. Multi-NVMe (m.2, u.2) adapters that do not require bifurcation | ServeTheHome Forums

[-]

FullstackSensei@reddit

The issue with Gen 4 isn't just lanes. Until very recently, you could get epyc boards for less than the cost of that PLX card. Even now, you can find the occasional deal if you're a bit savvy.

The thing with Gen 4 is that it's a lot more sensitive to noise. So, your cables also have to be Gen 4 (more expensive) and your riser needs to be able to handle Gen 4. And all this for what? 3090s don't have p2p anyway. And we haven't even talked about how so many consumer boards crap themselves when more than two GPUs are connected, even ignoring lane count.

If you go with an Epyc, you get 128 lanes, many boards expose 90+ of those on PCIe slots. At X8 per GPU, that's 12 GPUs without sweat, while having ~16GB/s dedicated bandwidth from the CPU to each GPU. You'll absolutely need that without p2p because all communication has to go through the CPU (and the code running on CPU). With a PLX, bandwidth to the CPU quickly becomes a bottleneck.

You'll actually see worse performance using that PLX card, because all your GPUs will have to use that single x16 link to the CPU, whereas now you have 20 lanes. And you're lucky your motherboard didn't crap itself out already with three GPUs.

I have four 3090s in a Epyc system, and I can see 25GB/s per GPU running models across all four using ik_llama.cpp. That's 100GB/s (byte, not bit) flowing in and out all four during inference.

[-]

AFulminata@reddit

I'm fairly rough in the space, can you recommend what you mean by this? Maybe an example board of the gen3 vs the gen 5?

[-]

FullstackSensei@reddit

Your average AM5 has 24 Gen 5 lanes. A 10 year old workstation platform such as X299 has 48 total lanes, and actually more memory bandwidth than the latest Zen5 Ryzen (quad channel DDR4, up to 4266, vs dual channel DDR5).

If you go server, the 10 year old LGA3647 has 48 Gen 3 lanes and another 8 from the chipset for a pair of M.2 SSDs. It has six DDR4 memory channels at 2666, that when fully populated give you almost 2x the memory bandwidth of DDR5-5600. Most boards use a few lanes for things like built-in NIC and the like, but you still get 40 lanes on slots. That's 5 GPUs at X8 each. If you get a 20+ core CPU, you'll also be able to run 400B models at Q4 at 12-18t/s offloading to CPU, depending on which GPUs you have.

[-]

Overall-Animator9323@reddit

Gracias por esto. Esta idea que das, para instalar dos 3090, provoca demasiado ruido y calor para estar en la misma habitación que el usuario? El consumo eléctrico es alto?

[-]

FullstackSensei@reddit

Wrong on both accounts. I have an 8 GPU machine under my desk and even using tensor parallelism it consumes ~800W at the wall during inference adespite the paper specs saying 2500W.

Noise is completely dependent on how you engineer it. Said 8 GPU machine is quieter under load than a laptop.

[-]

Overall-Animator9323@reddit

Eso pinta bien! Tienes algún post donde hables de ese setup? Sería muy didáctico. Hace unas semanas que le doy vueltas al tema de usar una vieja workstation, pero la información que encuentro me habla de esas dudas que te planteaba.

[-]

MackThax@reddit (OP)

This needs to be a pinned post. It was near-impossible to find that info.

[-]

FullstackSensei@reddit

There are so many of these stories. I see at least a couple every month.

[-]

MackThax@reddit (OP)

Then a curse hid them from me.

[-]

techdevjp@reddit

You have to use Google to search reddit effectively.

Add site:reddit.com/r/LocalLLaMA to limit searches to just this sub.

[-]

WiseassWolfOfYoitsu@reddit

Yep, I was able to get it working on some special consumer boards, but the best move I ever did was getting a used Threadripper WRX80 board for the inference stuff.

[-]

FullstackSensei@reddit

Now imagine how much you'd have saved had you gone for an Epyc board instead of TR.

[-]

pengy99@reddit

Sir, ghetto would have way more zip ties, no case, just sitting on a wire shelf.

[-]

BroadAddendum7134@reddit

Please share how you run vLLM, the script parameters, i have the same setup and I try to use tensor parrallelism

[-]

MackThax@reddit (OP)

haven't gotten that far

[-]

Positive-Protection1@reddit

I can now tell my family they have no right to complain about how my rig looks.

[-]

Vicar_of_Wibbly@reddit

Wonderful! I love it. Allow me to buck the trend - my rig used to look quite similar to yours until (after extensive and careful planning) a Christmas holiday project led to this:

It's a cube with 4x rtx6000 pro workstation (two per side) oriented to blow outwards. The frame is made from 400mm 4040 aluminum extrusion and custom 3D-printed corners, brackets, LED matrix screen, etc. that I designed in Tinkercad. Runs AMD epyc with Silverstone AIO for cooling. Intake fans are Noctua Chromax black swap 200mm and 120mm, with 92mm for DRAM cooling.

More photos on my vibe-coded photo blog-type-thing here: https://blraaz.net (no trackers, cookies, ads, product placements or anything like that - just DIY AI server porn).

[-]

michaelsoft__binbows@reddit

What a truly sick rig. Optimizing the hot exhaust airflow is wonderful, and I'm a big fan of the aluminum extrusion, the whole thing is amazing. What do you run on it these days?

I guess if I had to suggest something, 3d printed corner pieces to connect the extrusions will see by far the greatest mechanical stress so personally I would have gone with chunky steel hardware there just to go for the proper level of overkill there but the idea for popping squash balls in for feet is amazing too.

[-]

Vicar_of_Wibbly@reddit

Thank you!

It’s running mostly MiniMax M2.7 FP8 for agentic work, which really flies. The comment regarding steel corners is apropos of the moment because I have on my bench a bunch of steel corner brackets and internal brackets I’ll be adding soon. They’re completely unnecessary for strength, it’s already ungodly strong, but once painted black they’ll help make it look pretty industrial as well as being indestructible.

[-]

MackThax@reddit (OP)

sick

[-]

KoalaCloaca@reddit

[-]

BroadAddendum7134@reddit

Please share how run vLLM, the script parameters, i have the same setup and I try to use tensor parrallelism

[-]

MackThax@reddit (OP)

bro managed his cables

[-]

LankyGuitar6528@reddit

It's beautiful! Beats the hell out of my 120TB RAID Array running on Microsoft Storage Spaces...

[-]

MackThax@reddit (OP)

this guy actually doing something useful with his AI

[-]

LankyGuitar6528@reddit

Yes! I'm going to point it at Robinhood. No need to even check my account... I'm too busy out shopping for Lambo's. Nothing can go wrong.

[-]

Nnazeroth@reddit

You are far way myfriend

[-]

LankyGuitar6528@reddit

Omg... that.. thing... if it gets up in the middle of the night and murders you I wouldn't be surprised. AGI confirmed.

[-]

Nnazeroth@reddit

IT has tried but my t100 stopped it

[-]

MackThax@reddit (OP)

RGB for ekstra tokens

[-]

Nnazeroth@reddit

You cannot buy anything without RBG today... I hate them!

[-]

dave-tay@reddit

Oooh llama porn... very nice

[-]

MackThax@reddit (OP)

Now, that's a new image in my head...

[-]

sagiroth@reddit

Well, with this setup you are well off for a video...

[-]

kaisurniwurer@reddit

Getting a mount on the side of the case actually seems like a quite neat way of fitting more GPUs.

Not only are the PCI adapters straight, but also there is a lot of free real estate there.

You technically could print holders for a transparent (or noise cancelling) sheets to cover it up and get some fans to redirect some air. Though jank is fine too.

Or make the holder clips slot into the case cover, though that is a lot more work with the cutting and such.

[-]

marktuk@reddit

What do you use your local AI for?

[-]

MackThax@reddit (OP)

I didn't plan that far.

[-]

sagiroth@reddit

This should be a headline of this sub.

[-]

marktuk@reddit

Fair! 😂

I got ollama running recently and while it's cool I can run models locally, I haven't found them particularly useful because the quality is so poor compared to commercial models.

[-]

MackThax@reddit (OP)

How big models?

[-]

marktuk@reddit

9B so far, only got 8GB VRAM unfortunately

[-]

twoiko@reddit

Try using an optimized MOE with CPU offloading

[-]

voltaire321123@reddit

9B isn't so bad as long as you're using it for agentic tasks and not relying on it's training data. I built a local web researcher chat agent that's based around Qwen3.5-9B. It rocks for that sort of stuff. It's not great for coding tasks though.

[-]

MackThax@reddit (OP)

Yeah, that's useless. 30B is where it starts getting interesting.

[-]

PigSlam@reddit

Can confirm. I just put a Radeon Pro AI R9700 32GB in my Fractal Ridge gaming PC (it replaced an RX 9070). I figure my Ridge must be one of the very few on the planet with that configuration, but my setup is rather pedestrian compared with your work of art.

[-]

marktuk@reddit

Yeah unfortunately I am just not willing to spend thousands on a GPU to run that, I could just upgrade my Claude plan 🤷‍♂️

[-]

MackThax@reddit (OP)

It's either $$$ for a pro GPU, sanity for a project like this or your soul to the corpo-feudalists. I just love living in the future.

[-]

Prof_ChaosGeography@reddit

Switch to llamacpp and maximize the quant size. You'll find they are a ton better and faster now as ollama is just a wrapper that trades speed and quality for ease of entry

[-]

Nyghtbynger@reddit

I don't see any cardboard

[-]

noo8-@reddit

I like it I like it alot Gives me fallout vibes ;)

[-]

Jolly_Criticism9190@reddit

wHY DoNt u jUsT BUy A bLAcKwELl RtX 6000 w/ 96gB VRAm? u oNLy NEeD oNe PcIe SLot

[-]

MackThax@reddit (OP)

I now have to convince myself this project and the money saved were worth my sanity,

[-]

Wallaby989@reddit

it is not the destination but the journey

[-]

WiseassWolfOfYoitsu@reddit

I sometimes pull up the page and long for one... but my MI100 system has more memory and cost less than half as much as the Blackwell, and that's WITH the memory and Threadripper

[-]

HOLYxFAMINE@reddit

How difficult has it been to run multiple MI100's? I've convinced my CFO to let me look into local llm hosting for our 500 person company (not 500 concurrent user).

[-]

WiseassWolfOfYoitsu@reddit

Make sure to get a set with a Infinity Fabric. It just works with an off the shelf rocm based llama.cpp server container now, as long as it's ROCm 7 or newer. It could be crashy before that.

I'm currently playing with 70B models, but planning on trying out GLM 4.7 - with enough system memory and MOE offloading it should work at something like Q4.

[-]

alexanderi96@reddit

hell yeah

[-]

Visual_Internal_6312@reddit

Ok wait, you have 96gb vram BUT use a HDD?

[-]

MackThax@reddit (OP)

don't need fast storage

[-]

Visual_Internal_6312@reddit

That’s a datacenter brain with a NAS from your uncle’s basement.

[-]

spammmmmmmmy@reddit

I am curious to know how you are all attaching these disassociated PCIe cards. Do you get something like a PCIe extender ribbon cable?

[-]

MackThax@reddit (OP)

yes

[-]

JockY@reddit

Sometimes you just need some jank in your life :)

[-]

MackThax@reddit (OP)

bruh

[-]

ExtremeAdventurous63@reddit

“The most ghetto local AI server”? Hold my beer, I have 4 BC250s on the way to build the scratchiest AI cluster I can think of right now! Still, great job. I love these kind of setups

[-]

tiddayes@reddit

Here is my ghetto rig with 128gb of vram

[-]

tiddayes@reddit

[-]

LeatherRub7248@reddit

hot tits... what are those 3 arcs? what arcs are they?

[-]

tiddayes@reddit

There are 4x arc pro b70’s . The 4th is using a riser since it interfered with the ssd

[-]

LeatherRub7248@reddit

very nice.. which mobo are you using ? and what PCIE lane bandwidth are all 4 running? they all just hook into the mobo?

[-]

MackThax@reddit (OP)

sick

[-]

jacobpederson@reddit

[-]

MackThax@reddit (OP)

what is going on here?

[-]

jacobpederson@reddit

Threw in the 5090 4090 and 3090 into the same rig. It is for a project that runs an LLM on the 3090, image gen on the 4090 and video gen on the 5090. Pulls 1600 watts from the wall :D

[-]

bruhhhhhhhhhhhh_h@reddit

It has a case.

Nice work though

[-]

Technical_Corner3553@reddit

Did you ever crypto mine? Getto miner rigs eat this for lunch.

[-]

old-mike@reddit

It looks fantastic! Wow!

[-]

Limp_Statistician529@reddit

Who needs a Mac Mini when you have this lmaooo

[-]

seasonedcynical@reddit

Funny how similar these 'rigs' can be :-)
1200watt PSU, Aliexpress X99 Mobo (because of the dual pciex16 slots) comes also with the same Xeon cpu.
But in my case two RTX3080 (modded)20GB picked up during my visit to china.
Very stable system i must say.

[-]

stopwwIII@reddit

I Was wondering about "ghetto" meaning and i dont love google translate , thanks man

[-]

BoxWoodVoid@reddit

So this is what your GF looks like? You could have covered her at least!

[-]

MackThax@reddit (OP)

She rambles a lot.

[-]

sourceholder@reddit

What are you using the HDD spinner for?

[-]

MackThax@reddit (OP)

I forgot to calculate that a very small number of 100GB models fit on an 230GB SSD.

[-]

tmvr@reddit

That SSD is probably the most infamous for sudden deaths, look for a spare while it works if you don't have one already and have a backup of the files from it you don't feel like re-downloading.

[-]

MackThax@reddit (OP)

I like a little danger in my life.

[-]

zeta_cartel_CFO@reddit

Don't knock if it works. Although I'm curious on the token stats. Can you share those?

[-]

MackThax@reddit (OP)

There are some numbers in another comment

[-]

tangosox@reddit

I'm really curious about the v100. What would be the comparison between a v100 and a 3090 for instance? I know it doesnt have bf16 support and some other things but I'm looking for an affordable way to run bigger models. I'm only running moe models on my 4070S, gemma4 26b is the fastest most coherent model I can run currently.

[-]

MackThax@reddit (OP)

More VRAM. Probably less performance, I don't know.

[-]

snipsuper415@reddit

Pfft you have a case

[-]

MackThax@reddit (OP)

it did have a date with an angle grinder though

[-]

MackThax@reddit (OP)

fair

[-]

feverdoingwork@reddit

what kinda performance are you getting out of this?

[-]

MackThax@reddit (OP)

there's another comment of mine with some numbers

[-]

SnowyOwl72@reddit

thats like 150+ watts idling GPUs. Beauty.

[-]

MackThax@reddit (OP)

Nah, each GPU uses around 30W idle. The more worrying thing are the 700W it uses when NOT idle.

[-]

FissionFusion@reddit

whats the little fan blowing into the GPU called?

[-]

MackThax@reddit (OP)

Little? You mean the 120mm blower fan? It's called "120mm Blower Fan" on Amazon.

[-]

corruptboomerang@reddit

Nah, not even, you've got at least 3 levels of jank up you can go through. What is it, the V100 or whatever it is that you plug into a PCIe adaptor.

[-]

MackThax@reddit (OP)

wat

[-]

corruptboomerang@reddit

Yeah, you can get a V100 16GB that uses like an SXM2 connector, and a PCIe adaptor... Duno if there's any that let you use 2x V100s and have them interconnect, I'm sure some smart weirdos are working on it though, sounds like a great way to get cheap AI at home.

A V100 16GB is about $100-200, so a pair and a PCIe card (assuming eventually a PCIe adaptor will allow interlinking) could be in effect a 32GB card for under $500...

[-]

MackThax@reddit (OP)

ok

[-]

ryfromoz@reddit

Cue cartman singing in the ghettoooo

[-]

thoquz@reddit

Nice 3D printed fan blower adapter, did you design it?

How are the other two GPU's cooled?

[-]

MackThax@reddit (OP)

Thank! Yep. I caved and bought a 3D printer for this mf after a while of failing.

There's another big fan in the case (that white round sticker) that blows into those two.

[-]

tklein422@reddit

Best way to learn right here my friend! Keep up the good work!

[-]

Square_Elderberry_66@reddit

I have some questions but it won’t let me post 🥲

[-]

Megneous@reddit

I run a dual 1060 6GB setup. I don't wanna hear anything from you lol

[-]

grabber4321@reddit

Do you get free tokens out of the end of that GPU? Yes? Ok then its worth it.

[-]

MackThax@reddit (OP)

Free tokens and 600-800W of free heating. 🥵

[-]

twoiko@reddit

"free"

[-]

grabber4321@reddit

WIN WIN!

[-]

Sofakingwetoddead@reddit

IDK man. I don't see junk, I see beauty! Awesome setup!

[-]

gdwallasign@reddit

just got a couple v100s in myself

[-]

cicoles@reddit

Ran out of space but need more POWER!

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

LA_rent_Aficionado@reddit

Yes, this is exactly the Jerry rig that make this sub great.

I also forgot what quiet sounds like

[-]

LankyGuitar6528@reddit

OMG... what the hell is that?! It's so hideous it wraps right back around to stunning. Love it.

[-]

LA_rent_Aficionado@reddit

It’s the source of my dreaded power bill and thank you haha

[-]

jafarykos@reddit

card #4 in the stack is gasping for air. This is amazing. I saw a 4 3090 water cooled build for sale on local FB marketplace for $5k and I regret being unable to grab it. It was so pretty, but also we have yours.

[-]

FearFactory2904@reddit

In the spirit of sharing jank ai builds...

[-]

FearFactory2904@reddit

[-]

smb3something@reddit

Damn, you just went to town with some snips lol

[-]

jafarykos@reddit

the Kool-aid man over here in GPU form.

[-]

FearFactory2904@reddit

Lol yeah man. That p40 was going to fit one way or another. Later had a second one dangling outside the case using one of those cryptominer pcie 1x extensions.

[-]

cleversmoke@reddit

This looks so early 2000s, I love it

[-]

MackThax@reddit (OP)

This looks like it runs Windows 98 haha. I can hear the HDD scratch and the modem screech.

[-]

jafarykos@reddit

Step 1: Cut a hole in the box Step 2: Put some GPUs in the box

..

My power cables stuck out too much to put the lid on my 4U case with my 5070TI, so when I bought a 3090 I figured I'd just make a bunk bed situation. All the processing bits are on the lower bunk and the top bunk has the cards. This lets me get a weee bit more airflow.

But I ended up with an 8U cube that's kinda fucking heavy.

[-]

pixelworld_ai@reddit

This is the way

[-]

ElderberryLow4127@reddit

That’s the shit I like

[-]

jazir55@reddit

Ghetto is when you breadboard the the entire mobo and then encase it inside a 3d printed toaster shell.

[-]

Royale_AJS@reddit

If it runs and gets the job done, it ain’t jank.

[-]

Lesser-than@reddit

the cursed 16gb laptop sodimm in an adapter is maybe the best jank I have seen yet :)

[-]

LankyGuitar6528@reddit

...damn... did not see that. Jank level 10 unlocked.

[-]

sammoga123@reddit

You're missing the water tank to leave your neighbors without water, lol.

And I'm joking, but in these cases you realize that that stupid anti-AI argument is, well, really stupid.

[-]

LankyGuitar6528@reddit

Oh no... I just got a 3090... I should probably order some water...

[-]

SillyLLM@reddit

Look at those fancy 3D printed brackets! You’re not even using zip ties. Way more professional than some bitcoin mining rigs out there!

[-]

Equal_Giraffe8866@reddit

one loves to see it

[-]

TonyStark1500@reddit

Wow! Jealous. I was thinking of ordering one of these V100’s on eBay. How good are these for local LLMs? I was thinking of getting one V100 32GB and run Qwen 3.6 27B on it? Or is that a terrible idea?

[-]

MackThax@reddit (OP)

I get 39T/s on Mistral small 24B, quant 4_M.

I really can't claim whether or not it's a good idea though.

[-]

TonyStark1500@reddit

Thanks for the info!!

[-]

Noob_Krusher3000@reddit

I love this build. Obviously, the VRAM is great, but how do 3 Teslas perform? I can imagine GGUFs being great for this.

[-]

MackThax@reddit (OP)

I gave some numbers in another comment.

[-]

riley_srt4@reddit

You could probably fit a smaller power supply in this system by under volting the GPUs. Just something to consider.

[-]

Wyldkard79@reddit

Hey, what are the blue things holding up the gpus? Is that a 3d print? I need something like that as I have some Radeon MI25s I need to stack sideways like that. Dig the setup, if it works it's not ghetto, it's a work in progress.

[-]

MackThax@reddit (OP)

Definitely still WIP. 3D prints, yes.

[-]

Consistent_Maize1915@reddit

Get one of those open minig racks for like $45 on ebay

[-]

MackThax@reddit (OP)

then I wouldn't get to use an angle grinder to make room for a fan

[-]

Infamous_Mud482@reddit

really hoping those mounting brackets are at least PETG. The sheen on the ducts makes me think it might be.

[-]

MackThax@reddit (OP)

is.

[-]

Total_Listen_4289@reddit

This is a work of art

[-]

Thebandroid@reddit

That’s what we’re calling ghetto?

I see that and raise you a 9070xt plugged into a z77 mono with an i7-3330.

[-]

Woof9000@reddit

I see no cardboard or duct-tape, you still have ways to go, to claim that title.

[-]

Upstairs_Tie_7855@reddit

[-]

JockY@reddit

Yaaaasssss

[-]

jld1532@reddit

This feels like you've crossed over into fire hazard territory

[-]

MackThax@reddit (OP)

hahahaha yea boi

[-]

Upstairs_Tie_7855@reddit

Not so bad at 15-20% speed, enough to keep them from throttling during inference

[-]

Mauer_Bluemchen@reddit

Beautiful!

[-]

kiwibonga@reddit

Nice. I have a mile of plastic tubing and 3 nvlink boards and 4 V100s somewhere between my house and China right now.

This is my porn.

[-]

SureTie253@reddit

Thats actually amazing :D i’m super noob about this but how do you use 3 gpus? I mean how can you add 32+32+32=96?

[-]

CalBearFan@reddit

llama.cpp works well with multiple GPUs, I have 4 3090s on a Supermicro server board and it works great across them giving a 96gb capability

[-]

MackThax@reddit (OP)

I just run a GGUF model with kobold.cpp and it just works. That is after installing the correct drivers and CUDA framework. And all the other shit I had to deal with.

[-]

kartblanch@reddit

I love the ghetto tech rise im seeing. Frankenstein machines to run llms are peak utilization of free will.

[-]

daMortarMerrier@reddit

Mine is almost that bad. He name is "Abomination". Picture to follow.

[-]

jzzlr@reddit

looks like old cryptomining rigs, before crypto got all filthy. love it!

[-]

not_a_db_admin@reddit

the laptop sodimm in an adapter is somehow worse than the fan knob, and i mean it as a compliment

[-]

panchovix@reddit

How are the temps with that fan on the V100? I want to try a similar one for an A40.

[-]

MackThax@reddit (OP)

It got up to 82°C with a long job in Pi, with the fans on the lowest setting. Then I waddled my ass over and turned the knob halfway up and it got down to 60°C. The fans are more than enough. One can cool two GPUs no problem. I wanted as big of a fan as possible for reasons of noise. I still have to make a PWM controller for the fans, but I'm confident it will be pretty quite when idle.

[-]

panchovix@reddit

Nice! How much power do these V100 use, 300W as well?

[-]

MackThax@reddit (OP)

Max. 250W by default.

[-]

FailBait-@reddit

Ah I must have misremembered. 250 and then capping to 200 with minimal impact then.

[-]

FailBait-@reddit

Not OP but they can do up to 350, but you can power cap to 300 with only a 3-4% hit to performance with minimal fuss. I’ve done the same in my R740xd

[-]

ambient_temp_xeno@reddit

People have made coffee table books about weirder things than everyone's jank AI builds.

[-]

StardockEngineer@reddit

Looks fine to me.

[-]

-Ellary-@reddit

OP: Hi Everybody! I'm Johnny Catswill and tonight we gonna run TheDrummer_Behemoth-X-123B-v2.1-GGUF Q8

[-]

Colecoman1982@reddit

Eh, you've got a regular mid-toqer case in there. I've seen much more ghetto setups like hanging the motherboard, GPUs, and power supply off if a wire rack with a box fan blowing on it for cooling...

[-]

riconec@reddit

Have you considered nvlink between them? I am wondering how it improves speed

[-]

MackThax@reddit (OP)

I might be stupid, but I don't see where to plug the bridge in, on these cards.

[-]

riconec@reddit

Hard to describe but opposite side from pcie connector, there is a pcb visible with slits, nvlink bridge goes in there

[-]

MackThax@reddit (OP)

ahhhhh! I can see it! The backplate covers it. I probably have to remove it to expose it.

[-]

riconec@reddit

I think it is totally fine with backplate, however I am not sure how hard to finds those bridges (especially for 3 cards) and how pricey they are. what kind of models are you running on them? I was thinking to get v100 32 with further expansion with nvlink with one more

[-]

MackThax@reddit (OP)

Haven't tested many models yet, but

https://www.reddit.com/r/LocalLLaMA/comments/1tpdt5m/comment/oo87bza/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

[-]

can999999999@reddit

Dude this is beautiful

[-]

GrapefruitMammoth626@reddit

Safe to say, we all appreciate this junk pile.

[-]

MackThax@reddit (OP)

I appreciate you <3

[-]

OwnerByDane@reddit

Does the RAM pose any issues? 16GB seems light

[-]

MackThax@reddit (OP)

Currently no, but oh boy did it. Capacity is not an issue, since everything is in VRAM.

[-]

smart4@reddit

All with just a 1200W PSU?

[-]

MackThax@reddit (OP)

Well, yeah, the GPUs are 250W and they don't all run 100% at the same time. Other components are negligible.

[-]

Madman20201987@reddit

What’s the adaptor your using for the DDR4 Ram ?

[-]

MackThax@reddit (OP)

No idea lol. It was a surprise. I bought a small used PC just because it had RAM in it and when I opened it up - hello!

[-]

Ariquitaun@reddit

Janky and redneck. Love it.

[-]

sowerandreaper@reddit

Very nice! Would be interested to see your writeup and benchmarks!

[-]

dumbappsignup@reddit

I approve

[-]

Buildthehomelab@reddit

This reminds me of

[-]

MackThax@reddit (OP)

Ooh, wait, I'll dig up an old pic.

[-]

Buildthehomelab@reddit

oof thanks for trying

[-]

z3n777@reddit

ghetto ai is the best kind of ai

[-]

olliec42069@reddit

Actually that brings up a thought... why are there no hood models? I need one to speak to me in 90s/00s ebonics.

[-]

onephn@reddit

Ghetto ai is OUR ai

[-]

FatheredPuma81@reddit

Surely a big cardboard box would be better?

[-]

philmarcracken@reddit

im jelly, I want at least one Tesla V100 to pair with my 12gb 3080ti. Do you get the blower as part of the deal on one?

[-]

MackThax@reddit (OP)

Depends on what you find in the dumpster alongside it. Mine came without the case bracket even. No power adapters, no fans, nothing.

[-]

isopropoflexx@reddit

Hey, it's only weird if it doesn't work!

[-]

isopropoflexx@reddit

What model are you using with this setup, and what kind of (real world) TPS performance are you seeing?

I currently have a few individual LLM servers on my local network, with the primary being a multi-RTX3090 build. It would really well but it's also fairly expensive. I've looked at the V100's many times as a possible option for another secondary/fringe LLM server to incorporate, but with it being built on an older architecture, from what I saw, options on what to run on them is fairly limited. So I'm very interested to hear more about your experience so far?

[-]

fuck_cis_shit@reddit

scavenging DDR4 from laptops, truly a sign of the times

[-]

zhambe@reddit

It's perfect. All you need is your cat to take a piss in it.

[-]

MackThax@reddit (OP)

haha After all the crap I had to deal with, I wouldn't be surprised if that happened even though I don't have a cat.

[-]

zhambe@reddit

I was so proud when I finally got my build all buttoned up... week later, some spider decided it was a perfect place to start building webs.

[-]

zipperlein@reddit

That's the Spirit. ;D

[-]

MackThax@reddit (OP)

hahahaha love it

[-]

onephn@reddit

cable management isnt rats nest how dare u call this jank

all jokes aside this is such a cool setup, the 3d printed mounting hardware makes this look sick af

[-]

MackThax@reddit (OP)

Thanks! I'm especially fond of the 3D printed air duct propping up the top fan.

[-]

DeltaSqueezer@reddit

You have 3D printed parts and metal struts! That's practically professional! Look at attemps from a couple of years back when GPUs were just balanced in a pile 😂

[-]

MackThax@reddit (OP)

Oh I ensure you, the balance of the parts is very precarious!

[-]

jarail@reddit

Nice build! We need more of this. Always inspiring how much people push forward the frontier of what can fit in a case.

[-]

MackThax@reddit (OP)

Fit? Inside? What do you mean "fit inside"?

[-]

jarail@reddit

Truly a mid-size tower.

[-]

LingonberryBorn2161@reddit

Interesting, same setup but with 5 P100's. But how is your setup not a noise farm? My Blower fans are only full speed or nothing. They fail at <11V, which means full speed always because they get even hot in idle mode..

[-]

MackThax@reddit (OP)

https://www.reddit.com/r/LocalLLaMA/comments/1tpdt5m/comment/oo828pi/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

[-]

lobopl@reddit

how many tokens/s you achieve?

[-]

MackThax@reddit (OP)

Mistral 3.5, q4_M 128B: 9.8T/s, Mistral small, q4_M 24B: 39T/s.

[-]

reto-wyss@reddit

That's a picture you can hear.

[-]

Littlepharaoh@reddit

Sounds like my tinnitus probably

[-]

Jetboy01@reddit

The only thing I'm hearing is the ai begging to be put out of its misery.

[-]

MackThax@reddit (OP)

Hahaha, it's actually *relatively* quiet. At least compared to a server blade. The fans are pretty big and need to barely spin when idle.

[-]

Fun_Assist7660@reddit

I think it’s pretty gangster

[-]

Bulky-Priority6824@reddit

damn one tipped iced tea away from a new build

[-]

MackThax@reddit (OP)

It's one sneeze away from falling over and bending important stuff.

[-]

grabber4321@reddit

Tell us whats the actual setup? Update the original post with specs!

[-]

MackThax@reddit (OP)

did do

[-]

HokkaidoNights@reddit

Ghettotech at it's finest - im kickoff seeing all these shiny rigs... this is the way!

[-]

Hedede@reddit

Why is the third V100 mounted outside? Is it because of the Delta blower? I find that a 120mm fan attached to the PCIe bracket is enough to cool multiple cards.

[-]

MackThax@reddit (OP)

Really? When used to 100%, they heat up a *lot*, quickly. I could try it.

There's another big fan inside the case, cooling those two cards. The third is up above because of packaging constraints. There was supposed to be a fourth one, but I'm having trouble with it still.

[-]

pot_sniffer@reddit

Ha this is unhinged, I love it

[-]

fizzy1242@reddit

damn that's outlandish, love it.

[-]

semangeIof@reddit

Beautiful