Is 2x 3090 with NVLink faster than 2x 4090 for large 70b models?

Posted by thefreemanever@reddit | LocalLLaMA | View on Reddit | 19 comments

I am wondering if a 2x 3090s with NVLink setup is faster than a 2x 4090s? For both inference and training/fine tuning tasks?

I have read in several other posts people talking about getting 10t/s on 2x 3090s and 17t/s on 2x 3090 + NVLink. It means 70% faster with NVLink.

As long as 40 series doesn't support NVLink, is a 2x 4090s setup still faster than a 2x 3090s + NVLink?

If it's not, do you think there is any other reason to choose a 2x 4090s over 2x 3090s + NVLink for a home AI machine?

PS:The problem revolves around the slower communication speed via PCIe compared to NVLink. When a model exceeds 24GB, it should be split among the cards, and 3090s + NVLink exhibit a speed advantage in this aspect. However, I am not sure if this advantage extends to cover the speed advantage of the 4090s.

[-]

ZET_unown_@reddit

PhD student specializing on Computer Vision here. In general, 2x 4090 will be faster than 2x 3090 NVlinked by around 40%.

See link below and scroll down to the image right above Conclusions, where they specifically benchmark 2x 4090 against 3090: https://lambdalabs.com/blog/nvidia-rtx-4090-vs-rtx-3090-deep-learning-benchmark

Whether it's worth the price difference is up to you and whats available to you.

I don't know your specific use case, whether its only finetuning and inference or if you also want to research and building new models, but as a general rule, you should always go for the largest VRAM on single card, because thats more often the limiting factor. With slower speed, you just need to wait a few days more, but with too little VRAM, you can't even train the model. VRAM pooling is a lot of headache and how well it works highly depends on the model itself.

I would recommend the RTX 6000 Ada 48GB or the older RTX A6000 48GB. Out of the dual 4090s and dual 3090s, I would go with the dual 4090s.

[-]

GetOutOfMyFeedNow@reddit

4090 cannot be NVlinked

[-]

ZET_unown_@reddit

You don’t need to NVLink. While it’s one of the factors affecting speed, but just because a setup doesn’t have it doesn’t mean it will be slower. I have access to 3090s and 4090s at the university research lab. And the a set of 4090s without NVLink is faster than 3090s nvlinked for neural network training and inference.

[-]

GetOutOfMyFeedNow@reddit

Two 3090s act as one if they are NVlinked, and that matter when running bigger models.

[-]

ZET_unown_@reddit

In all my dual GPU trainings and inference (using around 46gb VRAM), the dual 4090s has been faster than NVlinked 3090s.

[-]

Dusty_da_Cat@reddit

I have tried to run 2 x 3090s nvlinked.

Didn't do squat with Inference, and can't say I can confirm with training/fine-tuning, since I haven't tried that.

It might be a windows thing, but I haven't found any benefit in terms of added t/s nvlinked or not. Might be different on linux.

[-]

absolutxtr@reddit

Well it won't necessarily be faster unless you find something that supports some sort of parallelism. But it can certainly fit bigger models! It's just creating a vram pool, by way of insanely fast comm speed between both cards and their vram.

[-]

Pedalnomica@reddit

Most reports I've seen say NVLink doesn't help much at all with inference (unless the PCIE connection between your cards is slow as hell. Maybe like <= PCIE 3.0 x4 ?) and helps at most 20% with training/fine tuning on 3090s.

4090 Pros: They will be faster (again, assuming reasonable PCIE speeds), but probably not by a ton. They are also more modern so there are things they can do in CUDA that the 3090 can't.

Depends on what you're doing, but for most around here buying more 3090's is the better buy.

[-]

Dusty_da_Cat@reddit

It didn't have an effect either going from PCIE 5.0 8x/8x to a PCIE 3.0 4x/4x in terms of t/s.

I ended up getting a 3090 Ti and having it go on the PCIE 5.0 to go 16x and the 2 x 3090 going on PCIE 3.0 4x/4x. Even when I configure a GPU-split to just use the 3090s it didn't change the inference t/s speed, which averages 13t/s on 70b 5bpw and 11t/s on 120 3bpw.

However, I do acknowledge that there is a potential gain going llama.cpp for a potential nvlink speedup, but not sure the effort is worth the trouble going to a slower llama.cpp to nvlink for a 'potential' speed boost with additional tinkering needed or going exllama2 for already fast inference speeds with minimal effort.

[-]

ApprehensiveView2003@reddit

Can we get a year later update?

[-]

Dusty_da_Cat@reddit

What kind of update are you looking for?

[-]

Guilty-History-9249@reddit

update

[-]

nero10578@reddit

NVLink basically doesn’t do shit for inference and only helps training depending if you use inter gpu communications a lot or not.

[-]

ApprehensiveView2003@reddit

You can load larger models for one...

[-]

rdkilla@reddit

the extra 12000 cuda cores?

[-]

deleted_by_reddit@reddit

The problem revolves around the slower communication speed via PCIe compared to NVLink. When a model exceeds 24GB, it should be split among the cards, and 3090s + NVLink exhibit a speed advantage in this aspect. However, I am not sure if this advantage extends to cover the speed advantage of the 4090s.

[-]