What is the largest GPU home cluster running LLMs
Posted by badabimbadabum2@reddit | LocalLLaMA | View on Reddit | 13 comments
Hi,
I am interested of running very large models with multiple GPUs connected to one computer. I have seen someone had 10 7900 XTXs connected to one consumer level motherboard with risers. I have yet tried no more than 3 achieving 72GB of VRAM. The inference speed for 70B llama3.3 was quite good so I was thinking is there like 300GB models which could be run with 13 GPUs? I counted I could attach 13 7900 XTXs on my consumer am5 board with risers. Is here people having what size of GPU clusters made with risers?
I am interested how much does the inference speed slow down when the model size grows like 70B -> 300B if the model is still in VRAM. I am not thinking to run anything with CPU or normal RAM.
13 Comments
publicbsd@reddit
theobjectivedad@reddit
LicensedTerrapin@reddit
ForsookComparison@reddit
FullstackSensei@reddit
MrMisterShin@reddit
FullstackSensei@reddit
No_Afternoon_4260@reddit
Mass2018@reddit
sitmo@reddit
joninco@reddit
SillyLilBear@reddit
ArsNeph@reddit