Theoretically, could the Multi-GPU techology come back if they link the videocards with a some new superfast interconnect and make the operating system see them as one device?
Posted by Custer_Vincen@reddit | hardware | View on Reddit | 44 comments
The old Nvidia Sli had 1GB/s bandwidth, and typical video memory bandwidth was over 100 back then. Now the latest version of NVLink has a bandwidth of 1800 GB/s, and the RTX 5090 has about the same memory bandwidth I think.
Sopel97@reddit
multigpu works perfectly fine in consumer space right now, are you perhaps talking specifically about gaming?
Creative-Expert8086@reddit
5090 is 2x 5080 in size and all key metrics, but the gaming side show noware near 2x performance of 5080. Stacking more CUDA cores don't linearly translate to performance even on a single die, let alone two.
Nicholas-Steel@reddit
I think the big issue is memory bandwidth, which companies are trying to work around with high speed caches.
Toojara@reddit
Memory bandwidth isn't really the main issue, in general most Blackwell cards have more bandwidth per throughput than Lovelace equivalents. And even against the 5080 the 5090 had 86% more bandwidth for 105% more FP and texture which is pretty much in line. At low resolutions the biggest problem is in the the pixel rate which hasn't increased from the 4090 at all.
Creative-Expert8086@reddit
I think the issue is gaming does not work anywhere near linearly to core counts of your GPU, but mining and ML for example does.
dparks1234@reddit
Would the 5090 be significantly faster with HBM instead of GDDR7?
Hamza9575@reddit
No. Because the memory bottleneck is not on the vram chips but rather on the wires coming out of the gpu die. They are literally the most stressed part on flagship gpus. To the point that bigger dies are made solely so we can attach more wires to them so we can send more data to the gpu die. Thats why bigger dies are more powerful today, not because of more cores inside but because of more wires at the edge leading to those cores.
hanotak@reddit
It already does exist, with datacenter NVLink: https://www.nvidia.com/en-us/data-center/nvlink/
It's just expensive AF and only matters for HPC with GPUs that are also expensive AF.
Custer_Vincen@reddit (OP)
Interesting, but why wouldn't they want to introduce this in the top consumer cards? The tech enthusiasts are buying 5090s without looking at the price, so why not sell them 4 units at once?
Eldiabolo18@reddit
Because it makes zero sense for Nvidia. The consumer market is like 15% or something of their overall revenue. Most of this come from the average cards, streaming, etc. The overall percentage of people who buy high end cards is miniscule. The percentage of people who would buy two or more high end gpus is nowhere near worth it.
Nvidia is an AI/HPC company. Consumer products are legacy. If the AI boom keeps going we might not have Nvidia GPUs for consumer in 10 years...
entarko@reddit
Because it would create competition with the professional segment, which is far more profitable. Right now, if you do small scale AI that can fit on one GPU, you have many choices, including consumer grade GPUs. But if you want to scale to larger models, you need to switch to professional GPUs, which Nvidia prices way higher. Also explains why Nvidia does not want to put more memory on the consumer grade GPUs.
gomurifle@reddit
I think this is the answer.
mixmastersang@reddit
So pro GPUs for local PCs support nvlink and consumer doesn’t?
Low_Excitement_1715@reddit
Yes, but only for pooled computation, not for graphical acceleration.
SLI was always a solution looking for the problem, not the other way around. Making many GPUs and then spending lots of time and effort synching them was always more expensive than making one faster GPU. It only made sense while they couldn't make the GPUs big enough, complex enough, and power hungry enough to do it in one chip.
Ch0miczeq@reddit
because its unecessary cost that no one would use and actually content creators can use multiple 5090s if they want on pcie lanes after 2 its just better to render it in cloud
cafk@reddit
The difference is the workload that's necessary doesn't require parallel end of frame calculation, just an increased task pool.
For gaming you need additional optimization and output through a single device for frame generation, which also needs to work with the specific game rendering pipeline, so both drivers and games need to maintain support.
HippoLover85@reddit
Latency is the main issue. Bandwidth js what ai and pro users need and they can get that. With with gaming every frame needs to be milliseconds from each other. Synchronizing the work ttrough a device like a connect x7 and then switch would likely give very poor performance without a LOT of software work.
BoringSociocrab@reddit
There is also Blackwell RTX 6000.
GPU-Appreciator@reddit
Because PCIE is already fast enough for many apps to implement this as the software level and the only use cases where PCIE isnt fast enough are enterprise use cases, for which they want you to buy high end HPC stuff
Jonny_H@reddit
That's still slower, higher latency and uses much more power the on-die interconnects used between compute units on a modern GPU. You're competing against that bandwidth if you want to treat it as "one device", not memory bandwidth.
To get the best out of even NVlink you still need some level of awareness of the "split" at the application level, and some limitations of what tasks fit well that model as not all do.
hanotak@reddit
OP wasn't asking about on-die interconnects, since at that point it's not really multiple GPUs anymore. It's just tiling.
Jonny_H@reddit
No, I'm saying that for an operating system to truly treat them as one device, as asked by the op, you need to be on the level of on-die interconnects.
Hell, even current tiled compute devices have some pretty big limitations on /what/ they can separate out as "tiles" efficiently vs a monolithic die. Even the best EMIB-like solution puts limitations on the design (if you're working on a 2d plane at least, but then again nobody has got stacked logic really working yet either).
Mango-is-Mango@reddit
It’s already coming back in a sense with lossless scaling. But we’re not seeing sli again
a_man_of_mold@reddit
The lossless scaling dual GPU fad is entirely propped up by fanciful idiots. Imagine having all the drawbacks of dual GPU with the extra power usage, heat and noise. Then you have to fiddle around with LS and game settings endlessly, hoping whatever game you're playing even supports offloading rendering to the second GPU.
All this so you can use an inherently flawed technology to generate some interpolated frames, in a slightly less awful way. Hella epic bro!!!! Just buy a good single card instead of falling for this dumb meme, and sell whatever card you had before. Don't want to sell because you can't be bothered? Has sentimental value? Keep it as a paperweight, frame it on your wall or chuck it in the bin. Anything is better than using it for this nonsense.
zerinho6@reddit
Can you provide some technical reason why it is something "propped up by fanciful idiots"? I've seen spreadsheets and tests showing reduced latency when using dual GPU and I don't think buying a better GPU solves all the cases LS solves nor is the point of it.
SituationSoap@reddit
"I have seen spreadsheets that suggest that it's not always a huge waste of time and energy" is like the platonic ideal of something being propped up by fanciful idiots.
CarnivoreQA@reddit
"fake frames bad"
team56th@reddit
Honestly we have a better chance with LSFG-styled frame multiplier card, that’s about as close as we’ve come to the multi GPU comeback
xternocleidomastoide@reddit
the Multi GPU approaches you are thinking of, in terms of graphics/games, were developed in order to solve the bottlenecks of that time in regards to the front and middle sections of the graphics pipeline.
However those haven't been a limiter in modern GPUs in a very long time.
The problem with SLI and cross-fire was that they created a new problem/bottleneck at the end of the graphics pipeline (rasterization/frame generation).
So games had to be developed with multi-GPU aware profiles.
However gaming developers are already taxed enough, and frankly most lack actual knowledge of the low level stuff to do this. So it was hard to find competent people to take care of this, so most developers just ignored it. And a lot of games ended up unoptimized. And since SLI/xfire are not transparent, you would end up with a lot games where the extra GPUs went unused, or in some cases you go slowdown.
There are still multi GPU approaches using NVLink. But those are for pro cards, and they are rather expensive.
So the key takeaway is that it is not worth it. You will not get enough developer engagement, and an industry that is already a pressure cooker, adding an extra avenue for developer frustration is just not going to gain much traction.
Lastly, the market for multi-GPU setups at the consumer level was just tiny. And frankly not worth bothering with at some point. From all players involved: game developers and driver/hw vendors.
CatalyticDragon@reddit
This comes up every now and again so I have a prepared response.
Back in 1998 3dfx found they could very nearly double frame rates (or double resolution) by processing alternate lines (or frames) on different devices.
In GLQuake a Voodoo2 would get you a little over 60FPS at 800x600 but in SLI that was nearly 120FPS making it as fast as the Voodoo3 [source].
It worked so well that NVIDIA (who bought 3dfx) and AMD continued to support this sort of feature for a long time until it became clear that games were too complex for a simplistic driver side approach to be efficient.
The problem was the driver would present all GPUs as a single device. Game developers had no idea if it was one, two, or more GPUs. They couldn't optimize for it and you needed driver profiles for each game. Things often became messy and could even end up performing worse than with a single GPU as synchronization tasks interrupted rendering.
That's where DX12 and Vulkan come into the picture. Both of these graphics APIs were designed to allow for natively interacting with each GPU. Developers could access them each as needed (explicit multiGPU) or could set GPUs up as "linked node adaptor" where it worked like old SLI, with each GPU rendering alternate frames. Or as unlinked where you would access GPUs as separate compute devices just as you might with individual CPU cores.
The newer multi-GPU functionality was implemented in a few games and we saw scaling of 1.6x to 2x in notable examples like Deux Ex Mankind Divided, Gears 4, and Rebellion engine games like Sniper Elite.
Because this was now done native to the API developers could optimize for it. And because it was built from the ground up with async compute in mind (meaning copy tasks could be done in the background and in parallel to other render tasks) and because PCIe bandwidth had advanced so much there was no more issue with stuttering and poor 1% lows which plagued the old driver side approach.
This was so great, so flexible, that you could even use different types of GPUs together. Even GPUs from different manufacturers. Here's an NVIDIA GTX970 and an AMD390X working together to get 47% more performance than a single 390X, or 92% more performance than a single GTX970. Or here's a Fury X and GTX980 Ti working together to be 137% faster than a single 980Ti.
So it's at this point you want to know, if it was so great then why it didn't take off?!?
Following on from #3, even though all this functionality was in DX12 / Vulkan, and even though all the communication ran over PCI-Express, NVIDIA locked support away unless you bought a hardware dongle (SLI bridge) from them. You could not click "enable" in the driver without it. This was not an NVIDIA feature. This was standard Microsoft DirectX API level and NVIDIA put it behind a check box.
Then they began to drop the hardware connector support on lower end GPUs wiping out your ability to use this standard API feature for most people in the exact segment where it made the most sense. This was about the most anti-consumer thing I've ever seen in computing (even worse than NVIDA's refusal to support open adaptive sync standards or cheating in benchmarks).
To make matters worse NVIDIA continued to conflate "SLI" (a branding term they got with the 3dfx deal) which represented a flawed technology from the 90s, with multiGPU technology from APIs which was developed over 15 years later. These two things were not remotely the same but because of NVIDIA people kept referring to new API level technologies along the same lines as decades old closed driver side technology with a poor reputation.
Around about this point somebody comes along saying "no no, it failed because temporal effects! TAA doesn't work with multiple GPUs". This is flat out wrong.
DLSS is a well known temporal post processing effect and on page 53 of the "NVIDIA DLSS Super Resolution (version 3.7.0)" programming guide there are explicit instructions and examples with code on how to setup DLSS to work with Multi-GPU support. It is extremely easy to implement DLSS in linked node mode with CreationNodeMask/VisibilityNodeMask and is no more than two lines of optional code.
So that's why it isn't a thing even though every modern API supports it. People use Unreal Engine, they target consoles, and NVIDIA doesn't like it.
At least that is true in desktop video games. In simulation, AI, 3D rendering / ray tracing, video editing, and other workloads using multiple GPUs is very common.
Prasiatko@reddit
Why doesn't AMD use it.
RBeck@reddit
I used to Crossfire a few generations of AMD cards. Overall it wasn't that popular.
CatalyticDragon@reddit
Use it? Because AMD isn't a game developer. But AMD does support mutli-GPU on every one of their graphics products because this is a standard part of the DX12 and Vulkan APIs.
Prasiatko@reddit
Ah didn't no that. Will have to try it as i've a spare vega card in the cupboard. Odd they don't advertise it more.
CatalyticDragon@reddit
They did but it didn't take off for the reasons I mentioned so now there are very few games which support it.
You could try to get Strange Brigade, Sniper Elite, or Ashes of the Singularity working but I don't think anyone has tested that for a long time.
webjunk1e@reddit
The main problem was always coordination. That would still be a problem today. It's one thing to be paralleling some sort of data or AI processing across multiple cards, but trying to render frames of a game consistently between multiple cards is a bloody nightmare. That's why Nvidia eventually dropped it. It never worked right and there wasn't really any way to ever get it to work right.
kimi_rules@reddit
With PCIE 5.0, it's fast enough you could link it over PCIE with some limitations here and there.
Rafa998@reddit
The answer is latency. For sure, NVLink has the throughput and it works well for AI stuff, and it probably would work well too for async rendering, but for real-time rendering, the latency of a remote memory call is just too much. And since a remote memory call is not an option, what rests is the already known alternate frame rendering and all its defects, which killed multi gpu in first place.
Maybe someday, chiplets and the newer generation die interconnets call fill the gap for a 4x high end gpu.
HippoLover85@reddit
Info is coming to market soon and will allow amd and nvidia to create many gpus (or disregard them) on the same package for relatively cheap.
Im not sure we will see this in 2026 for discrete gpus (amd strix halo and medusa halo will use this tech which are 2025 and 2026 products), but we should definitely see it in 2027 and 2028 and should allow for some large gains in top end performance. And help with the costs on really high end parts (a little). But that help will likely be offset by advanced fab/node costs, so you are unlikely to notice . . .
Perfect-Cause-6943@reddit
People are using dual Gpus for Ai learning
CatalyticDragon@reddit
And rendering, and video editing, and simulation..
createch@reddit
Theoretically yes, but it wasn't popular and the gains didn't work out in consumer workflows, so the concept was dropped from consumer GPUs.
In practice, that's when you move to professional and datacenter cards that aren't 3 slot wide monsters and have better performance per Watt.
triemdedwiat@reddit
What is the real killer application for this setup?
Then ask yourself how many other solutions already exist for this.
Then you'll understand why it existed in the first place and why it has almost disappeared now.
Chitrr@reddit
It is possible, but not worth it. Getting a stronger gpu is better and cheaper than getting 2 medium gpus + the link.