Whats the latest status on 7900xtx multi-GPU setups?

Posted by ziphnor@reddit | LocalLLaMA | View on Reddit | 13 comments

I am currently running dual RTX 5060 ti 16gb (both of which are easy to sell or re-use in other PCs at home) and monitoring the used market for more of the same and alternatively RTX 3090. I couldn't help but notice that sometimes some quite "juicy" prices show up for 7900xtx (50-60% of RTX 3090 price used).

I know that AMD maturity has lagged behind, but also that catch up is being actively worked on. The 7900xtx has some pretty nice stats overall (same memory bandwidth, same VRAM and much higher TFLOPS, but lacking NVLink of course).

Is tensor parallelism etc supported by now in e.g. vllm and others?

[-]

BigYoSpeck@reddit

Tensor parallelism works in vllm

Split mode tensor works in llama.cpp with ROCm but not Vulkan. It's still occasionally flaky so I often settle for the performance drop of row or layer split methods

I run two, and they're not as fast as 3090's but the ecosystem for running inference on them has matured a lot. I can't really think of any show stopping issues I've hit using them

The lower price, better general Linux support, faster gaming performance, and ultimately still being fast enough for running models while giving the same capacity as 3090's means I'm happy to live with their slight compromises over Nvidia

All that said though I think if I already had 2x 5060 Ti's which is about the same power draw of one 7900 XTX, and I had the motherboard, case and PSU to handle it I might be tempted to just add in one 3090 rather than splurging for two 7900 XTX

[-]

sn2006gy@reddit

Too bad the prices of 7900xtx caught on to this.. they're up 500 bucks from just a few weeks ago.

[-]

xeeff@reddit

that's crazy cuz i just bought used 7900 xtx for £560 and it's only 6 months old, temps are great no complaints. i also bought another for £570 but it's got issues with temps so i'm returning it although it's still in my possession. would love to use 2x 7900xtx but my power supply is only 750w ;(

[-]

Iron-Over@reddit

Wow bought in September for 999 Canadian now 1500+. Glad I got a couple

[-]

ziphnor@reddit (OP)

I guess it depends on where you are, I have seen a few selling used maybe 10% over a 5060 ti 16gb (also used).

[-]

sn2006gy@reddit

yeah, they're all expensive.

[-]

ziphnor@reddit (OP)

What kind of performance are we talking about? On my 2x RTX 5060 ti 16gb using vllm 0.20.0 and the latest guides for speculative decoding i have the nvfp4 version at \~70 t/s and nicely high preprocessing ( token generation can go +10% or so with with the Lorbus Autoround INT4 model, but prepreprocessing is cut in half then).

In general I am considering if it should move to a 4x 5060 ti setup, or just plug the two i have into the kids two gaming PCs (and sell their ancient GPUs), and then buy some 7900xtx instead. I mean 3090 would also be great, but they are selling for \~2x the 5060 prices.

[-]

Glittering-Call8746@reddit

How would 4 gpus help.. wouldn't pcie bus be a drag? Do u have epyc workstation ?

[-]

ziphnor@reddit (OP)

I also initially heard that PCI-E bandwidth would be a major issue, but I also read from a lot of people that the impact is not that bad in practice. But obviously I am trying to figure this out before buying :) (or maybe buy, and just sell again if it does not work out).

In my current setup, PCI-E is already horrible with one GPU on a real 16x PCIE gen 5 , and the other on a chipset PCI-E Gen 4 4x slot, and I am getting \~60-70 t/s with Qwen 3.6 nvfp4 using vllm. If i add more cards it would be on M2 adapters. One could get a Gen 5 x4 and any others would be on Gen 4 4x.

[-]

starkruzr@reddit

I think it depends on what you're trying to do. LLMs are mostly fine with PCIe bandwidth limitations but when you start to get into stuff like image gen it really starts to suffer.

[-]

ziphnor@reddit (OP)

Well, this is localllama :) I am focused mostly on agentic coding and normal chatbot. However, I am also interested in image gen (Comfy UI etc), so that is interesting to know. I actually thought image was even easier to split the work for.

[-]

starkruzr@reddit

you know, I think my info is outdated. I just googled and apparently you can do things like put diffusion on one card and a text encoder on the other now. ComfyUI-MultiGPU explicitly manages this. so probably worth looking at after all?

[-]

deathcom65@reddit

They r terrible don't buy them and leave them for us 😂.can't let the cat out of the bag until I've snagged a few more.