Best bang for the buck GPU

Posted by Ok-Cucumber-7217@reddit | LocalLLaMA | View on Reddit | 95 comments

I know this question is asked quite often, but going back to old posts makes me want to cry. I was naive enough to think that if I waited for the new generation of GPUs to come out, the older models would drop in price.

I'm curious about the best GPU for Local LLMs right now. How is AMD's support looking so far? I have 3 PCI slots (2 from CPU, 1 from chipset). What's the best bang for your buck?

I see the RTX 3060 12GB priced around $250. Meanwhile, the RTX 3090 24GB is around $850 or more, which makes me unsure if I should, I buy one RTX 3090 and leave some room for future upgrades, or just buy three RTX 3060s for roughly the same price.
I had also considered the NVIDIA P40 with 24GB a while back, but it's currently priced at over $400, which is crazy expensive for what it was a year ago.

Also, I’ve seen mentions of risers, splitters, and bifurcation—but how viable are these methods specifically for LLM inference? Will cutting down to x4 or x1 lanes per GPU actually tank performance ?

Mainly want to run 32b models (like Qwen2.5-Coder) but running some 70b models like llama3.1 would be cool.