I am new to the LLM scene and I want to build a PC to accommodate over 30 B parameters, aside for price will be the best build? I want to do at least a GTX 4090 GPU it doesn’t matter if it’s AMD or Intel.
Posted by AmillieIO@reddit | LocalLLaMA | View on Reddit | 7 comments
I’m completely new to the scene and I just want to be able to run large set locally superfast.
teachersecret@reddit
A 4090 plus any modern rig will run a 30b model no problem. The gpu is the biggest part of the equation. CPUs aren’t as important. Just get FAST pci-e on the motherboard (don’t skimp), a solid cpu from the modern intel or and lineup, 64-128gb of the fastest ram you can, and a high speed SSD or two to store the models.
liftingfrenchfries@reddit
What's the definition of "fast" PCI-E on the motherboard? Having all 16 lanes and at least PCIe 4.0?
If yes, then I assume I wouldn't need to upgrade my Ryzen 5600x for local LLM with an RTX 4090?
Downtown-Case-1755@reddit
Build a cheap PC around an A6000, basically.
Terminator857@reddit
Wait a month and you will get the much better 5090.
ButtlessFucknut@reddit
There’s a scene? Why did no one tell me there was a scene?
randomqhacker@reddit
If you're dead set on RTX 4090 or above, just wait and get the RTX 5090 in a month. It's not that much more and will have 32G VRAM, and way faster.
That would open up q6_k quants of 32B models and iq2 quants of 70B. Or lower quants with a lot more context.
Get a recent processor and at least 32G RAM so you can keep your models cached, and a fast PCI-E 5 NVMe drive to load models quickly.
teachersecret@reddit
Tacking on… define “super fast”.
30b models run quickly on a 4090, but I wouldn’t call them “superfast” by any means. 30-60 tokens per second or so. Totally usable and quick enough for most, but it’s probably worth managing your expectations (when I think “superfast” I think about things like batching inference or groq pushing thousands of tokens per second).