What’s the best way to add VRAM to my system?

Posted by mrgreatheart@reddit | LocalLLaMA | View on Reddit | 17 comments

Apologies for the tedious Luddite question, I’ve been trying to read up and my head is spinning.

I have a 5070 Ti 16Gb, Intel Ultra 7 CPU, and 32Gb DDR5 RAM.

With the crazy used prices it looks like a 5060Ti 16Gb might be one of the cheapest ways to double my VRAM with an NVidia card. Would OLLAMA et al play nice with that combo?

Is there a cheaper or similarly priced but better route?

I assume it wouldn’t work mixing NVidia with AMD or Intel?

I’m in the UK in case that matters.

[-]

pheoxs@reddit

Finding a 3090 is generally the best value option

[-]

mrgreatheart@reddit (OP)

I’ve heard this. Would it play nice alongside my 5070Ti?

[-]

Kayo4life@reddit

today i got reminded the 3000 series is no longer the most recent

[-]

sagiroth@reddit

If u can fit both yes. There is no difference for inference with mismatch using llama.cpp

[-]

Fluffywings@reddit

Cheapest is if your system supports bifurcation of the PCIe slots and your psu can handle a 5070 Ti. I would take this option depending on pricing.

For single card, the best performance for your dollar would be some 32GB card like the PRO R9700. Intel now has 32GB cards now but their support is very little.

Alternatively 7900 XTX, 3090 24GB VRAM cards.

[-]

mrgreatheart@reddit (OP)

I already have a 5070Ti. Are you saying I should buy a second identical card?

[-]

My experience with different vendor different GPUs is so-so. You will be limited by software support, so if you do use llama.cpp with AMD and NVIDIA cards in one system, you can use Vulkan backend to use all GPUs regardless of vendor. The drawback is that Vulkan has awful prompt processing, always worst than CUDA (for pure NVIDIA system) and ROCM (for pure AMD system). I have done llama.cpp Vulkan with an RTX 5060 Ti 16GB with a Radeon VII, and I would very much prefer using CUDA and ROCM backend separately instead of using Vulkan.

Since you already have NVIDIA, I would suggest buying a gpu in the same family (Blackwell), but you can buy an RTX 3090 although you would be missing out on the latest compute features that Blackwell offers, for example, running NVFP4 quants although in my experience they're not as good as AWQ quants yet but maybe in the future it will get better.

[-]

mrgreatheart@reddit (OP)

Thank you!

I have a 3090 used near me for a good price.
A new 5060 Ti (16Gb) would run me about 30% more.

Would you go for the higher VRAM of the 3090 or the Blackwell architecture 5060 given that choice?

[-]

specify_@reddit

Its a no brainer to go for the 3090, especially since you can find one cheaper than the 5060 Ti 16GB. RTX 3090 has more bandwidth and VRAM than the 5060 Ti 16GB, so you can run larger models. And if you can run the 3090 and 5070 ti at x16 PCIe lanes, that would be great for reducing inference startup times, although this would require an enterprise-grade CPU+motherboard as virtually every consumer desktop CPU only supports 20-24 PCIe lanes.

[-]

mrgreatheart@reddit (OP)

Thanks so much. I'll grab the 3090.

[-]

droptableadventures@reddit

you can use Vulkan backend to use all GPUs regardless of vendor

If you use GGML_BACKEND_DL=ON to have dynamically linked backends, you can load both the CUDA and ROCm backends without using Vulkan.

[-]

specify_@reddit

I actually didn't know that until now. I just looked thru the compile from source documentation and never have I ever visited that portion of the documentation about multiple GPU backends💀 This might be a better option than Vulkan actually

[-]

jacek2023@reddit

No idea about ollama, with llama.cpp I can use both 3090 and 3060/2070 without any issues.

[-]

mrgreatheart@reddit (OP)

Thanks

[-]

segmond@reddit

To add an additional card to your system. You need an extra slot to connect it, you need power cables to supply power to the card, usually 2x8pin VGA/PCIe cables. You need space since most of these cards take up about 2 slots. So open up your PC and look, do you have a 16x pcie slot and space for the card to sit in? Do you have extra unused power ports to supply power? If so, buy card. If not, you might have to upgrade your PC or hack around it if you don't have the money to upgrade. ollama and llama.cpp will support additional cards no problem.

[-]

mrgreatheart@reddit (OP)

Thank you very much.