I am new to the LLM scene and I want to build a PC to accommodate over 30 B parameters, aside for price will be the best build? I want to do at least a GTX 4090 GPU it doesn’t matter if it’s AMD or Intel.

Posted by AmillieIO@reddit | LocalLLaMA | View on Reddit | 7 comments

I’m completely new to the scene and I just want to be able to run large set locally superfast.

[-]

teachersecret@reddit

A 4090 plus any modern rig will run a 30b model no problem. The gpu is the biggest part of the equation. CPUs aren’t as important. Just get FAST pci-e on the motherboard (don’t skimp), a solid cpu from the modern intel or and lineup, 64-128gb of the fastest ram you can, and a high speed SSD or two to store the models.

[-]

liftingfrenchfries@reddit

What's the definition of "fast" PCI-E on the motherboard? Having all 16 lanes and at least PCIe 4.0?

If yes, then I assume I wouldn't need to upgrade my Ryzen 5600x for local LLM with an RTX 4090?

[-]

Downtown-Case-1755@reddit

Build a cheap PC around an A6000, basically.

[-]

Terminator857@reddit

Wait a month and you will get the much better 5090.

[-]

ButtlessFucknut@reddit

There’s a scene? Why did no one tell me there was a scene?

[-]

randomqhacker@reddit

If you're dead set on RTX 4090 or above, just wait and get the RTX 5090 in a month. It's not that much more and will have 32G VRAM, and way faster.

That would open up q6_k quants of 32B models and iq2 quants of 70B. Or lower quants with a lot more context.

Get a recent processor and at least 32G RAM so you can keep your models cached, and a fast PCI-E 5 NVMe drive to load models quickly.

[-]

teachersecret@reddit

Tacking on… define “super fast”.

30b models run quickly on a 4090, but I wouldn’t call them “superfast” by any means. 30-60 tokens per second or so. Totally usable and quick enough for most, but it’s probably worth managing your expectations (when I think “superfast” I think about things like batching inference or groq pushing thousands of tokens per second).