What is the largest GPU home cluster running LLMs

Posted by badabimbadabum2@reddit | LocalLLaMA | View on Reddit | 13 comments

Hi, I am interested of running very large models with multiple GPUs connected to one computer. I have seen someone had 10 7900 XTXs connected to one consumer level motherboard with risers. I have yet tried no more than 3 achieving 72GB of VRAM. The inference speed for 70B llama3.3 was quite good so I was thinking is there like 300GB models which could be run with 13 GPUs? I counted I could attach 13 7900 XTXs on my consumer am5 board with risers. Is here people having what size of GPU clusters made with risers? I am interested how much does the inference speed slow down when the model size grows like 70B -> 300B if the model is still in VRAM. I am not thinking to run anything with CPU or normal RAM.

Reply to Post

13 Comments

[-]

publicbsd@reddit

I hope you also have a large solar array.

[-]

theobjectivedad@reddit

This isn’t going to get you close to 300GB but I’m running a Lambda Vector with 4x A6000s for my research and have been mostly happy after 2 years. I’m running Llama 3.3 70b at full b16 via VLLM. My inferencing use cases usually include batches of synthetic data generation tasks and can get around 200-300 response tokens/sec depending on the workload.

[-]

LicensedTerrapin@reddit

So... You're running Twitter bots? 🤣

[-]

ForsookComparison@reddit

> I'm running a Lambda Vector My mid tier gaming rig that's been forced to house an extra GPU is looking at you with so much envy right now

[-]

FullstackSensei@reddit

What is this 300GB model that you want to run? And do you actually have a use case that justifies this?

[-]

MrMisterShin@reddit

Deepseek V3 that’s probably the one.

[-]

FullstackSensei@reddit

Can run it at decent speed using a dual Epyc Rome system for the cost of 2-3 7900xtx with zero dangling cables

[-]

No_Afternoon_4260@reddit

Have you tried it? What spec and speeds are you getting?

[-]

Mass2018@reddit

My current aspirational build I've been eyeing is: Motherboard: ROME2D32GM-2T (currently about $1800) CPU: 2xEPYC 7K62 ($700) RAM: 1TB ($3200) Not including PSU, hard drives, cabling, cooling, etc., you're looking at about $6.5k which will let you connect up 19 GPUs at 8x PCIe 4.0.

[-]

sitmo@reddit

This machine can handle 32 GPUs connected to one computer [https://gigaio.com/supernode/](https://gigaio.com/supernode/)

[-]

joninco@reddit

You should look at runpod.io. Rent the configuration or one similar you are thinking about owning. Tinker for a few bucks and see if it's worth owning. I changed my mind on buying an A100 after I spent 6 dollars on various configurations and realized a 3090 with a lower quant works fantastic.

[-]

SillyLilBear@reddit

I'd use Together and compare it against Claude and see if you even get the results you want with large models. I have yet to be impressed even with 405B models compared to ChatGPT/Claude.

[-]

ArsNeph@reddit

Probably this monstrosity: [https://www.reddit.com/r/LocalLLaMA/comments/1hi24k9/home\_server\_final\_boss\_14x\_rtx\_3090\_build/](https://www.reddit.com/r/LocalLLaMA/comments/1hi24k9/home_server_final_boss_14x_rtx_3090_build/)