Is 2x 3090 with NVLink faster than 2x 4090 for large 70b models?

Posted by thefreemanever@reddit | LocalLLaMA | View on Reddit | 19 comments

I am wondering if a 2x 3090s with NVLink setup is faster than a 2x 4090s? For both inference and training/fine tuning tasks?

I have read in several other posts people talking about getting 10t/s on 2x 3090s and 17t/s on 2x 3090 + NVLink. It means 70% faster with NVLink.

As long as 40 series doesn't support NVLink, is a 2x 4090s setup still faster than a 2x 3090s + NVLink?

If it's not, do you think there is any other reason to choose a 2x 4090s over 2x 3090s + NVLink for a home AI machine?

PS:The problem revolves around the slower communication speed via PCIe compared to NVLink. When a model exceeds 24GB, it should be split among the cards, and 3090s + NVLink exhibit a speed advantage in this aspect. However, I am not sure if this advantage extends to cover the speed advantage of the 4090s.