What is the best value card I could buy for decent performance?

Posted by equinoxel@reddit | LocalLLaMA | View on Reddit | 22 comments

I have a 1080 (ancient) card that I use now with 7b-ish models and I'm thinking of an update mainly to use larger models. My use case is running an embedding model alongside a normal one and I don't mind switching the "normal" models depending on the case (coding vs chatbot). I was looking for a comparator for different cards and their performance but couldn't find one that gives os/gpu/tps and eventually median price. So I wonder about the new 9060/9070 from AMD, the 16g Intel ones. Is it worth getting a gpu vs the 395 max/128g or nvidia's golden box thing?

[-]

Daniokenon@reddit

"395 max/128g or nvidia's golden box" there are no reliable tests of how it works in practice yet, so I would wait.

[-]

colin_colout@reddit

People are reporting slow prompt processing with 395. DIGITS will be similar and will likely be harder to tinker with for a while due to the proprietary design.

[-]

terrafoxy@reddit

could be because it's still early.

[-]

colin_colout@reddit

Apparently it's a problem across all ROCm in Linux. Might be different in windows

[-]

Daniokenon@reddit

Too bad... On paper the specs are impressive, especially in the stagnant x86 architecture. If it had pcie, a card like the 7900xtx or 9700xt could help a lot.

Normally, throwing part of the model into RAM doesn't pay off, I wonder how it would be here - probably much better.

[-]

terrafoxy@reddit

https://llm-tracker.info/_TOORG/Strix-Halo

[-]

My_Unbiased_Opinion@reddit

3090, XTX, P40 is still viable if you can get them at a decent price.

[-]

NegativeCrew6125@reddit

I would caution against buying a P40 because it will likely lose CUDA support soon. Using one in the long term will likely require messing with pytorch versions.

[-]

Massive-Question-550@reddit

P40 likely isn't worth it, you be better off with a 3060 or 5060ti 16gb.

[-]

My_Unbiased_Opinion@reddit

depends on price imho. Qwen3 30B A3B can run at fast speed on a P40 with a solid quant. You would need to run a really low quant to fit 30B on a 16gb card with decent context.

[-]

oodelay@reddit

3090 hands down.

[-]

terrafoxy@reddit

https://www.amazon.com/GMKtec-ryzen_ai_mini_pc_evo_x2/dp/B0F53MLYQ6/140-5660414-2888862

128Gb ram

[-]

Massive-Question-550@reddit

This is the way.

[-]

suprjami@reddit

Dual 3060 12G

24B Q6 and 32B Q4 at 15 tok/sec for half the cost of a 3090

[-]

Massive-Question-550@reddit

If you can get it for MSRP the 5060ti 16gb might be a better option. More power efficient, much faster prompt processing, faster vram, and more vram per card. Also is ok for gaming and you will have warranty.

[-]

AppearanceHeavy6724@reddit

Prompt processing of 3060 is good enough-about 600 t/s for 32b model on empty context. Bandwidth of 5060ti is not much better than of 3060. I will still buy (once prices on local market hoes below 500 usd) 5060ti to compliment my 3060, but because I will be using for non-llm tasks where better compute makes difference: purely for llm 5060ti is waste compared to 3060.

[-]

custodiam99@reddit

VRAM is the most important. I had a 12GB card but I bought a 24GB card and it is a whole new world. Try RX 7900XTX 24GB. I had no problems with it using LM Studio on Windows 11.

[-]

iwinux@reddit

Is it possible to install 2x RX 7900XTX to have 48GB VRAM?

[-]

Massive-Question-550@reddit

Yes it's easy. But you need the space in your pc, a big power supply, and enough cooling.

[-]

PraxisOG@reddit

AMD gpus are underrated in VRAM per dollar if you're ok with them being slightly slower. I got two rx 6800 cards for $650, and they work pretty great in windows. If you have the money a 3090 is faster but only for models that fit in VRAM

[-]

Daniokenon@reddit

No 7900xtx is a beast. I use it with 6900xt and vulkan with very good results with windows 10. Remember to install the latest drivers from AMD, they add a lot of improvements for vulkan. Rocm are also developing more and more, although with two different cards in windows it doesn't want to work for me. In Linux, I haven't managed to set it up properly.

[-]

Massive-Question-550@reddit

If you want to run a ton of smallish models all loaded up at once then sure the 395 max is an decent choice. Don't ever go with Nvidia's DIGITS box as it's 2x the price of AMD's product, doesn't run windows, and you can't really play games or use it for general productivity tasks. 395 max the biggest model you can use is 70b Q4 and the performance isn't great. For big models it's basically get 2 or 3 3090's, or 3-4 5060ti 16gb as that gets you the most vram for the money with still pretty good speed.