How far would AMD Threadripper 3600 (24 core, 48 threads) and 256 GB of memory get me for running local LLMs?
Posted by x3derr8orig@reddit | LocalLLaMA | View on Reddit | 30 comments
I am thinking about buying this Black Friday a Threadripper to set up a local LLM inference machine. I would add some graphic(s) card, when the budget would allow me, but, this could be a start, no?
My reasoning is a "low" power consuption (280W), 256 GB of memory (enough to spare for some other tasks) and possibility to upgrade down the road. Without any discounts, this would be around 2200 euros (with cooling, case, a ton of disk space, etc, the whole package). I hope I could bring this down to at least 2000 or lower.
Does this make sense or am I delusional?
cookerz30@reddit
My Alibaba EYPC has been cruising along just fine since last October.
Terminator857@reddit
What model size? How many tokens per second?
LoafyLemon@reddit
Bro is still waiting for results
Terminator857@reddit
3 tokens per day? :P
KvAk_AKPlaysYT@reddit
A token a day keeps away
Barafu@reddit
Not far. In fact, a low tier new Ryzen with crazy overclocked RAM will get you much faster. But not as fast as a pair of 3090s.
DataGOGO@reddit
Can you explain your logic on the memory?
Even with the fastest overclocked memory (assuming it is truly stable) a true 4 channel setup will still have twice the memory bandwidth.
Sufficient_Prune3897@reddit
4x DDR 4 3200 vs 2x DDR 5 8000. In theory DDR 5 should be faster.
DataGOGO@reddit
Why only 3200? That generation has no problem running at least 3800; even my 1950x ran 4 x 3600 C15 1T and had read /write bandwidth over 131k.
Ddr5 8000, IF you can get it truly stable is what? 118k?
Sufficient_Prune3897@reddit
For reference, a 4090 has 1000k and even a 3060 has 360k.
DataGOGO@reddit
Oh for sure, no argument from me on GPU vs dram.
EmilPi@reddit
I guess you mean 3960X threadripper.
You can get 3970X results https://www.reddit.com/r/LocalLLaMA/comments/1erh260/2x_rtx_3090_threadripper_3970x_256gb_ram_llm/ (CPU-only results)
Also, notice that I bought used mobo + used 3970X for 1000 euros, and it works just fine. For 2200 you better buy 2-3 RTX 3090...
Terminator857@reddit
Intel granite rapids might be a better bet.
https://www.phoronix.com/review/intel-xeon-6980p-performance/10
koalfied-coder@reddit
This is like saying the world's best rollerblades for cross country travel. Can it be done? Sure. Should it when the 3090s are so cheap, probably not.
koalfied-coder@reddit
Just for clarification ram is not very important for llms. Vram is the wave. Get a nice p620 Lenovo and thank me later. Slap 2 3090 turbos, a5000 or a6000 and cook!!!
koalfied-coder@reddit
I HIGHlY recommend a used Lenovo p620 workstation with a a series card of choice. They can be had for less than 700 with thread ripper and the works. It's crazy how good they are.
TheNotSoEvilEngineer@reddit
get some cheap p40, they work fine for most LLM. 24GB of vram. Just have to work on adding active cooling to them.
CaptParadox@reddit
You're better off upgrading a video card than relying on CPU sadly. (5950x 16 core 32 threads here with a 3070ti)
StableLlama@reddit
Running a neural network is a massively parallel task. And it is a very simple task. This is something the GPU cores are perfectly suited for. And the complexity a CPU core can handle is overkill.
So it just comes down to the numbers: compare the count of the cores on the different systems and you'll know wich one is faster by which amount.
Terminator857@reddit
Not relevant. The bottleneck is memory bandwidth.
BlueSwordM@reddit
A 3960X would be fine, if a bit excessive for just running LLMs.
Do note you'll only have 100GB/s of memory bandwidth accessible since you only have access to quad channel DDR4 3200 with that amount of RAM.
FullstackSensei@reddit
Get an Epyc. Same specs, much cheaper. You can build a 64-96 core, 16 memory channel (375-400GB/sec), dual CPU Epyc of the same generation as that threadripper for about half the price if you're a bit patient. You'll also get additional nice features like 10gb NIC and remote management for free.
khrizp@reddit
Wasn’t it 12 channel per CPU?
FullstackSensei@reddit
8 DDR4 on SP3 (up to Milan), 12 DDR5 on SP5 (Genoa and later). Rome Epyc is currently the sweet spot for hobbyist. 48 cores (with 256MB cache) CPUs cost under 500$/€, and dual CPU motherboards can be had for around 300-350$/€ (for 1st gen Epyc motherboars which also support 2nd gen, but PCIe slots run at 3.0 speed). 2933 or 3200 ECC DDR4 costs a little over 1$/€ per GB. SP3 is the same socket as TR4/sWRX8 as far as CPU coolers are concerned, and there's plenty of those for cheap thanks to how popular first gen threadripper was.
khrizp@reddit
How many ccd? I remember reading that impacts performance and also remember reading that we don’t really add the memory together since it’s different cpu.
CMDR_CHIEF_OF_BOOTY@reddit
You'll be able to run any popular LLM model. It'll just be slow as hell. Even with newer CPU like the ryzen 7950x its just tolerable for smaller models. Not a bad option long term but if I had to do CPU inferencing I'd go with a am5 Asus pro art motherboard and a 7950x or 9950x cpu.
Because of that I ended up building a X99 xeon workstation to setup as a LLM rig. It was like $800 not including any gpus. Sure it's not the Fastest but once the models are loaded into the GPUs Vram that stops mattering. Just keep in mind you HAVE to do a GPU only inferencing on a X99 rig these CPU will never be fast enough to inference in a usable way.
Psychological_Ear393@reddit
I have a 7950X, 4 sticks because I need 128Gb for other things, and llama 3.2 3B is perfectly usable for inference in WSL through ollama/open webui. And that's at 3800M/T
Echo9Zulu-@reddit
Budget wise it wouldn't be worth cheaping out on a board that sacrifices even some of the bells and whistles. Getting higher density memory would also be a requirement to max out your channels to maximize throughput.
Also, this is the wrong setup to prioritize low power consumption; high load is to be expected from this hardware, even more so with inference tasks. You want a beefcake psu with minimum platinum efficiency rating, which above the 1600w level should be standard. That, memory and a high quality board would be good candidates for black friday deals in the long term. A build like this takes time to spec out so you should not full send it on the best deal, measure "deal" by specs then price and choose parts you can buy used carefully.
It's always a mixed bag with used parts but if you source it properly you could easily get a higher end board and choose the right deal on a threadripper. Maybe I'm off by a longshot. If your use case demands inference and other shenanigans at the same time then fucking full send it and get all the compute some poor mail person can schlep to your front door
SwordsAndElectrons@reddit
Going to assume you mean a 3960X.
Quad channel DDR4 is only going to be in the same ballpark as dual channel DDR5 for bandwidth, and that's going to limit your performance. A newer platform with overclocked memory would probably be a bit faster, I think.
The best thing it has going for it would be the available number of PCIe lanes, but that comes into play if you start adding GPUs.
kiselsa@reddit
Running anything higher than 70b (64 gb ram) will be too slow to read in real time. And even 70b will not be relatively fast too.
Better to spend your money on some used 3090s