RTX Pro 4000 + 2000 Ada ?
Posted by bromatofiel@reddit | LocalLLaMA | View on Reddit | 4 comments
So I just bought a RTX Pro 4000 BLACKWELL 24Gb to replace my RTX 2000 Ada 16GB, So far, I've been tinkering with llama-cpp, and esp. with Qwen 3.6 MoE , I was wondering if it was worth keeping the two GPUs. I know theorically, more VRAM is better, but do I have to follow RAM-like rules such as "both GPUs should be of the same size" or something similar? Morever, can both GPU communicate over PCIe or should I look for a more exotic connectivity? Kind of a GPU newbie here, so sorry for the dumb questions ¯_(ツ)_/¯
PassengerPigeon343@reddit
You can experiment with splitting across cards, or you can push the models to one card and use the second card for other workload like a speech-to-text model for voice mode or a smaller task model. If your main model doesn’t support vision for instance, you could have an always hot second vision-capable model to route vision tasks to. It’s always nice to have more compute and more VRAM.
abnormal_human@reddit
More GPUS is always better for something until PCIe slots or bus bandwidth becomes your bottleneck and you are not close to that. You can pool across them to squeeze a larger model. You can also use them for independent tasks.
Miserable-Dare5090@reddit
pcie is fine, look into tensor parallelism and you can run 32gb size models, plus cache on your main 4000 blackwell) card. If you use a frontier model to help you set it up you can optimize it
Kyuiki@reddit
My understanding is your speeds will be based on your slowest card if pooling VRAM. So a 3090 will slow a 4090, 4090 will slow a 5090.
The only thing that combining cards will do is give you more space to load bigger models. So if the new card won’t push you into the next bracket it’s better to just have the slower smaller card run smaller models.
I’m new too so this is based on my own research and I could be wrong.