RTX 5090 rumored to have 1.8 TB/s memory bandwidth
Posted by TechNerd10191@reddit | LocalLLaMA | View on Reddit | 231 comments
As per this article the 5090 is rumored to have 1.8 TB/s memory bandwidth and 512 bit memory bus - which make it better than any professional card except A100/H100 which have HBM2/3 memory, 2 TB/s memory bandwidth and 5120 bit memory bus.
Even though the VRAM is limited to 32GB (GDDR7), it could be the fastest for running any LLM <30B at Q6.
CountPacula@reddit
Wake me up when they actually put a useful amount of memory on the card. Two whole generations and only eight GB more memory? I don't care how fast the memory is, that's not enough of it, especially when you can get multiple 3090s for the same price as a single 5090.
ReasonablePossum_@reddit
They will be forced to put higher vram there oncw most games start intensively implementing llms and other AI for their regular functioning.
Sadly game devs are moving slow here.
evia89@reddit
I dont see AI in games (DLSS is not AI) for at least next 10 years. AI will be used to help create games but never run on user hardware
ReasonablePossum_@reddit
Never is a very strong word there. Nvidia themselves are creating the tech that will push this.
10 years sure. Probably a lot less tho, the new devs that got education in the age of AI will be implementing the tech in the next 2-3 years, but again, stuff is slow :(
cobbleplox@reddit
Games are just very tricky in that regard. The GPU the player has is already fully in use by the regular game. And I would not trust even some 30B to be reliable in whatever little feature I can construct around it. Response times are also completely awful for the instant reactions required in games. And and and and.
TheTerrasque@reddit
It would have to be a tiny model, like 3b maybe. Perhaps if you fine tuned one a lot and put some guard rails it wouldn't be completely batshit cuckoo.
cobbleplox@reddit
Again this would be getting very close to "what's the point" again. Because first you have to make a proper finetune and related dataset, and then you are confronted with a rather stupid llm that will mostly parrot the training data anyway, and that is probably even the point. It just sounds so pointless to begin with. The day of these things will come, but it is when you can actually let them have the agency you would integrate them for in the first place. And that day is not here. At least that's my opinion. It's also why, to my knowledge, you can't even put something on Steam where a user can freely interact with an ai.
TheTerrasque@reddit
I've been toying with the idea, and yeah.. What's the point at that point. A 30b model - who could even run that game? "Oh, you need two graphics cards in your pc, and one needs to be a x090" doesn't seem like a good market.
To make something that could actually be played by a decent chunk of gamers, you'd have to have a tiny model. With finetune and some guard rails you might get something that's a bit beyond dialogue trees, but frankly not much beyond. Might be enough to add a bit of flavor to the conversations.
cobbleplox@reddit
Guard: "Sure, I can do that! The guard shouts "Who is there?" - Let me know if you need anything else."
TheTerrasque@reddit
I did some quick experiments with some small models and a simple prompt, and gemma2 2b has some nice flavor, but doesn't follow instructions well. Nemotron-mini followed instructions much better, but was very dry.
Here's gemma2:
Prompt
And then this conversation:
Guard
Me:
Guard:
Me:
Guard:
Me:
Guard:
It's not terrible, but not great either. A small model gets easily confused, and this one seem to forget the facts. And with such small models they easily get confused if the prompt is too big. Still, I don't think it's impossible. This model would take about 2.6gb gpu ram, and still performs kinda meh and not really usable.
GetOutOfMyFeedNow@reddit
Ironic and funny that the Guard literally became the AI chatbot at the end 😂
ReasonablePossum_@reddit
Story progression decisions and changes depending on user input and actions can be placed on a small LLM, optimization can be used with the new real.time.upscaling; future VR rendering can be managed via gpu dependant rt apps, gfx like particle/physics simulations can be managed there as well.
I mean lots of stuff that already exist on paper or prototype level testing will use VRAm that as you mentioned, is currently fully devoted to run the game itself, and will be gradually implemented out there.
cobbleplox@reddit
See, one starts to lock it into rigid structures so it can't fuck up. Isn't it interesting how that then immediately sounds like something you could do without any llm?
I guess you could do a more modern version of these adventures where you have to enter text like "take soap", leading to perfectly predefined stuff, sure. But that's not what anyone expects from the groundbreaking AI-games, I would say.
ReasonablePossum_@reddit
It would be a lot cheaper on the dev side to have it done by an llm rather than programming everything by hand. Plus this would leave a lot more creative space instead of limiting itnto the man hours thatnwere busgeted for that.
cobbleplox@reddit
Integrating an llm into a proper game, where you already had to lock it down in a rigid system to only do stuff you planned for is a lot of things, but certainly not saving time. And if you consider the missing 1000 lines of if's to be so much work, and you think so highly of llms, you might as well let an llm write them.
ReasonablePossum_@reddit
Im not talking about 1000 ifs (thats a simple boring classic tree). Thats my point, 10-20x thay amount. And im not talking about the devs integrating the llm, but one in the gpu, used for all games in annegual degree with a standard framework.
cobbleplox@reddit
The 1000 ifs where sort of a metaphor. And the llm gets into that gpu by the devs integrating it in the game.
ReasonablePossum_@reddit
Its a waste of resources that way tho. A lot more effective to have a native llm on the gpu thats optimized and ready to accept certain types of problems
Standard-Anybody@reddit
I know for certain that AI is being used in games right now that you almost certainly play. Just for not what you expect. There are models running in games that do things like show recommended items to buy, etc.
evia89@reddit
Like in lol and dota? I think it runs on server
satireplusplus@reddit
Blockbuster games consistently push the state of the art. People bought new RTX cards to play the first raytracing games. They will buy nvidias 6090 plus or whatever to play the next blockbuster game with AI characters in it. Realistically, games using LLMs will also have a "run it in the cloud" option until large VRAM cards will be the norm.
Desm0nt@reddit
It's good that game devs moving so slow. I don't want to pay 2k+ USD each year just to play new games. If top tier cards become 1-2 years card with such a high price - it would be even more awful situation than we have now (and it's make all low and mid tier cards almost useless trash even for gamers, that probably impossible).
Persistent_Dry_Cough@reddit
The functionality would improve. You could still play games at the current level of functionality with the older card.
IntelVEVO@reddit
Nvidia wants you to fork out 10k for the RTX pro 6000 96gb
CubicleHermit@reddit
This is a gaming card. Later in the year, you can safely expect there will be a much pricier workstation version, probably with 64GB.
Rude_Assignment_5653@reddit
Well we are in the darkest timeline lmao
strawboard@reddit
I think VRAM is the most scarce resource in the universe which is why they can’t put more in video cards.
It’s extra funny because there is huge potential to use AI in video games for all sorts or things like NPC dialog, dynamic texture and model creation.
Nvidia being stingy even with their highest end card is infuriating.
Amoral_Abe@reddit
Nvidia is likely being stingy because of AI. They are the primary company selling super high end AI cards for 10s of thousands of dollars each. They have to be careful not to cut into that market since that's where their big money is made.
b0tbuilder@reddit
You would think this would cause AMD to change the game by making more VRAM available. It would be far more practical in RPGs for character dialog and more immersive. Unfortunately they too are chasing the datacenter margin dragon as well. The only way this will change is if enough bleeding edge capacity becomes available at foundries to split production with the datacenter stuff on a more advanced process node and to keep consumer GPU on an older node that is considerably less power efficient. However this is a simple thing to suggest and far more difficult to execute on.
adrenalinnrush@reddit
Honestly, it's good that they're clearly separating the market. Otherwise it would be like the crypto mining shortages all over again. Do you really want that?
strawboard@reddit
Yes we know. Everybody knows, and we neednto call them out on it as much as possible.
b0tbuilder@reddit
You are very much correct. The thing that makes vram as rare as unicorn blood is the fact that adding vram causes competition with datacenter hardware. It is super annoying because in many cases, more vram can be added without massive cost. We are all beholden to data center product margins
entmike@reddit
Not to defend Nvidia, but the consumer/prosumer line of GPUs will never satisfy all AI use cases we wish. I for one am happy to have a 32GB ceiling on a single GPU over the 24GB ceiling, assuming it doesn't cost as much or more than the A6000 I've been looking at the last 2 years lol
lolzinventor@reddit
For some applicators like stable diffusion it's not as easily to split a model over multiple GPUs. This makes the extra 8GB it worthwhile imo.
OutOfBandDev@reddit
Sticking with the just make a larger card will not fix the issue. Forcing people to use less resources is where we get real improvements. Why do you think deepseek-r1 is revolutionary.
SocialDinamo@reddit
I completely feel where you’re coming from but it wasn’t until I started playing with p40s until I realized that it doesn’t matter how big the VRAM is if you are super slow running those bigger models
Persistent_Dry_Cough@reddit
In fact, bigger VRAM is the ONLY way you will improve the inference speed on the big models. Widening the bus to the maximum width possible should be a given. Bus width is based more on the configuration of the chips on the board rather than a costly to manufacture upgrade of the memory controller. The most important issue with a GPU is that if you have to swap some of the model into system RAM or god forbid NVME then you're screwed on inference speed.
katiecharm@reddit
I know, this is really damn sad. There should have been 48GB minimum in that card, and they could have even surprised us with 64GB.
barefoot_twig@reddit
Nvidia specifically wants to make three different types of people pay three different types of money. They want the AI people to pay the most, Crypto to pay the middle and gaming to pay the least. The problem is that they must create products that, for example, cater to gamers, while also not eating into their AI cash cow. If a 5090 could handle AI at scale, rest assured Meta would stop paying $40k for H100s (or whatever).
The low memory is intentionally freezing us out for exactly this reason.
satireplusplus@reddit
Until a competitor comes along and offers more VRAM for less money. AMD and/or Intel need to get their shit together, now's the time to eat into NVDA's moat.
Massive-Question-550@reddit
True, you can see the weird matches behaviors by these companies which makes you think they have some sort of agreement going on or they just all have the exact same mentality, that or there is one company acting as the leader in the trends and everyone just follows.
robogame_dev@reddit
From raw memory perspective, $6k will get you a Mac M2 Ultra w/ 192gb of unified memory - it’ll be slower than a 50XX setup for models that the 50XX can fit in its VRAM but a lot faster for models that the 50XX cannot.
Massive-Question-550@reddit
Same thing with cpu's. Why the hell are we stuck with dual channel memory? Finally laptops and all in one workstations are increasing it with 8 channel memory so gone will be the days of overpriced thread rippers and maybe we can get quad channel memory as a standard.
Billy462@reddit
I'm not sure anymore. The datacentre stuff by this point is so differentiated that if you're seriously doing model training you need all the additional stuff like interconnect. More VRAM on a 5090 wouldn't change any of that.
For inference a bunch of other stuff is coming online and that might just move away from GPUs completely quite soon (at scale).
Igoory@reddit
Yeah, the data center stuff also has a much lower energy consumption than the gaming GPUs.
fullouterjoin@reddit
It isn't because they are running them 24/7 that power matters. They are power limited in their DCs and they can't exceed the HVAC limits. If they could just trade money (heat/power) for more compute they would.
dantonthegreatdanton@reddit
This is a the answer, hpc clusters, dcs and private clouds buy infra by the btu and energy requirements.
I just bought 16x h200 for my private cloud and use 4090s for my personal rigs. The lack of interconnect and proven solutions for a larger cluster from consumer cards is a hard pass.
The tdp difference is real and breaks the wallet at scale.
1 megawatt of capacity would get you roughly 201TB of VRAM for h200 cluster and about 53TB for both a 4090 and 5090 cluster with a fraction of performance.
That is not including any other equipment needed to make it work in which case that vram number for the 4090/5090 goes lower much faster than the h200.
smflx@reddit
Did you buy h200 sxm or nvl? Could you let me know the prices?
I waited b100 but it seem not available in PCI card form. So, considering H200 too.
evia89@reddit
Downvolt, limit TDP and overlock VRAM takes few minutes...
stylist-trend@reddit
For the least energy consumption, you could also just completely unplug the GPU.
Otherwise, I assume Igoory wants energy efficiency, and you're not going to get anywhere near H100 efficiency by just downvolting and overclocking.
great_waldini@reddit
Curious if you could share some examples?
ozzie123@reddit
This. If 5090 is 48GB, these AI companies will not suddenly buy 5090 in bulk because they need multiple 5090 and the inter-GPU bandwidth is not there.
Desm0nt@reddit
If Nvidia make something like 4090 Super Ti/ 3090 Super Ti with 48gb but old GDDR6 (not even 6x) - company will not be interested in it due to slow memory (compare to HMB2) and high power consumption. But for home usage as just 1 card in gamer PC - it's reasonably good. And can be produced in limited amount on old hardware remaining in stock and sold at an increased price (it's limited!) but cheaper than new.
With current prices for 5090 - in 1-2 years it will be more interesting and profitable to take the old used A100 on 80gb HMB2 for the same money. With more (and still fast) memory and reasonable power consumption and heat dissipation.
barefoot_twig@reddit
You're right that Meta wouldn't but some companies might. Take that number to 64GB and even more. Up to 80GB.. even more.
OnedaythatIbecomeyou@reddit
I dunno honestly. I agree somewhere around 64GB onwards, but I'm not convinced at all that 48GB would be a business model breaking decision. Surely the only businesses that would do this would be 60year old transport company directors who pass the 5090 to their web developer and think he's gonna produce autonomy on the roads haha.
jimmystar889@reddit
48gb would certainly drop being from a6000 to 5090
OnedaythatIbecomeyou@reddit
But by how much are you speculating?
Any AI specific performance nuance that balance out ddr6 vs ddr7 difference between the two?
There'd be massive issues sourcing them, driving scarcity even further upwards & resorting to scalper prices reduces the bang for buck argument?
All three are questions, not arguments. I have only consumer AI experience and ollama installed lol.
Also it's a 4, going on 5 year old card though, it's reasonable lol.
jimmystar889@reddit
I meant the a6000 ada. 3 years old, but still
killver@reddit
You can definitely train more cost efficient on those 5090 also on large scale. The only thing holding up some would be the TOS that do not allow this to some degree, but many wont really care.
CubicleHermit@reddit
The comparable one is the workstation RTX 6000 Ada, with 48GB today; they run about $7500.
I'd expect the RTX 6000 Blackwell to be the same 2:1 memory, so 64GB. Probably not available for a while except preinstalled in a workstation machine, but they'll hit the retail market eventually.
Bougie_Mane@reddit
"For inference a bunch of other stuff is coming online and that might just move away from GPUs completely quite soon (at scale)."
I'm interested in reading up about this. Can you point me in the right direction? Is it ASICs like crypto mining?
barefoot_twig@reddit
Training isn't the only thing they do
TheProgrammer-231@reddit
I think nVidia doesn’t allow consumer cards in data centers though.
martinerous@reddit
Right, Nvidia could easily find other (legal, etc.) ways to deny AI data centers from using consumer hardware.
satireplusplus@reddit
Tell me how that's going to be enforceable. Spyware in the GPU? lol...
CautiousAd4407@reddit
They simply won't sell to you, forcing you to give up whatever bulk discounts exist at those scales.
emprahsFury@reddit
how about mandatory 3 slot cooling
satireplusplus@reddit
nobody that trains models uses windows
martinerous@reddit
Not in "underground" (or China) data centers, of course. The "normal" data centers won't risk going illegal.
satireplusplus@reddit
They can put anything they like into their EULA's, doesn't make it enforceable and doesn't make it legally sound. At least in the EU, if you buy a GPU you can do whatever the hell you want with it.
emprahsFury@reddit
it does make it enforceable when they find out what's happening. "We used 10k nvidia gpus to train the monster" Ken later that day... "I don't remember selling 10k H-series to Meta"
AmericanNewt8@reddit
Funny because that's how all the Chinese players are basically working now lol.
C_Madison@reddit
As satireplusplus wrote: Not in the EU. Zuck can train their things as long as they want in the EU with nVidia consumer cards, no matter what nVidia says.
emprahsFury@reddit
As dr evil in america said "Youre just not getting it are you?"
No_Afternoon_4260@reddit
Yeap
barefoot_twig@reddit
Nvidia has all sorts of restrictions, but where there's billions of dollars a year in savings on the line, there are also clever people to solve that problem
Physical-King-5432@reddit
Nvidia cards are not that useful for crypto anymore. The best miners use ASICS these days
Background_Army8618@reddit
Correct, mostly - ASICs replaced GPUs for bitcoin over 10 years ago. The "recent" crypto mining shortage ended 2-3 years ago because Ethereum switched to staking, it was not replaced by ASICs.
The comment you responded to is outdated, at best, but incorrect regardless. Crypto mining is not a factor for Nvidia's market or consumer base in any way this point.
zakkord@reddit
Ethereum only provided $2.3 million issuance every day for the miners, the altcoin market is several times bigger today but most of it gets eaten up by ASICs(not Bitcoin asics, altcoin asics). And the new ones are coming out with the prices in the 10000 dollar range. His statement is 100% correct.
TheThoccnessMonster@reddit
No one is using GPUs to mine coins really though was his point and that’s correct.
barefoot_twig@reddit
I'm no expert, but I heard that alt coins, electricity costs and the strength of the market all play a role in whether or not GPU mining rigs are profitable or not at any given time. Even just people who view them as "cheap heaters" in the winter
kurtcop101@reddit
There's no serious money there, anyone with genuine money isn't running GPUs for pennies an hour gambling on crap coins. Or as a heater, it costs you more than just running the furnace if you have any real GPU such that Nvidia cares about your market segment.
Generally you're looking at gamers who have a GPU for gaming already, that are mostly uneducated about the crypto market, usually 18-25 and bought the GPU with early savings because they live or lived with parents, trying to make a few extra bucks, especially because they don't pay the power bill and they don't get to control their thermostat.
Or some variation of that.
The moment you pay your own power bill.. different story, even in my area where it's cheap.
great_waldini@reddit
I don’t think it’s so contrived. HBM is something a component in very high demand and the worlds production capacity for HBM is finite. Hence they sell it sparingly and charge accordingly.
bplturner@reddit
Gaming - Professional - AI
Ummite69@reddit
I completely agree. That’s why I keep wondering: why doesn’t AMD go all in with something like 64GB or 128GB of GDDR6? If I’m gaming, I’d choose the 5090, but for AI, AMD would be my go-to. Sure, some argue that AMD’s drivers aren’t as polished, but if they offer powerful, AI-capable cards to the general public, open-source innovations would likely emerge quickly.
pc_g33k@reddit
I hope game developers will push higher resolution textures and other things that require a larger VRAM in their next generation games, that NVIDIA had no choice but to add more VRAM to their gaming graphics cards. 😂
larrytheevilbunnie@reddit
I'm just happy the 5090 has more vram than the 4090 (I know this is an abusive situation)
trailsman@reddit
Besides that I think they are handicapping cards, given no real competition, so they're not future proof for more than 1 gen, guaranteeing themselves upgrade cycles and this recurring revenue.
aprx4@reddit
They dont do that because they know 48 or 64 GB with that memory bandwidth would cannibalize a lot of their own professional cards.
Desm0nt@reddit
They can reduce memory bandwidth for high-Vram card =) And limiting PCI-E lines. They already do this for 4060ti for example. Nothing can stop them from making two card series - fast one (for gamers) and big one (for home ML) and grab money from both separately.
katiecharm@reddit
So the problem is monopoly. Got it
hackeristi@reddit
Yeah, why do you think Nvidia employees are all millionaires…giving us 48/64gb cards is not going to make them richer. Come on do, not be selfish. Sorry Jensen. He did not mean that.
Specific-Local6073@reddit
Multiple cards draw multiple power.
infiniteContrast@reddit
For local use power is not a problem.
Enough-Meringue4745@reddit
Hmm? What do you mean? North America only has 1200w outlets 😂
RyiahTelenna@reddit
North America has 240V outlets. They're not that expensive to run.
Enough-Meringue4745@reddit
Not standard they don’t. Most households only have one or two outlets that run at 240v.
RyiahTelenna@reddit
Correct. They're not standard but they're very affordable to have run.
Enough-Meringue4745@reddit
I did one for my EV but in a condo it’s much more difficult lol
thedudear@reddit
1800w.
Enough-Meringue4745@reddit
Max 12 amps from a single standard 120v15a outlet.
thedudear@reddit
Which would be 1440w. And if the outlet is 120v15a then it's 1800w.
Nothing you have said agrees with any other part.
CubicleHermit@reddit
Should only continuously load to 80%. Nominal 120V can be as low as 110, so 1300W is a pretty typical nominal limit for a regular domestic outlet.
I ran a dedicated 20V circuit for my home server. 1500W is the largest PSU I've seen from mainstream brands, but there are 1800W from minor ones.
Enough-Meringue4745@reddit
Appliances should never draw a constant 1800w. Look it up.
PikaPikaDude@reddit
You'll then have to plug your 6090 in on a different circuit from the rest of your pc and peripherals.
7090 will come with a custom made circuit combiner from NVidia that burns your house down.
No_Afternoon_4260@reddit
At some point they'll ship with a small portable nuclear reactor
Specific-Local6073@reddit
Of course it is a problem. Electricity bill is the problem.
satireplusplus@reddit
Of course its a problem. More watt = more heat = higher electricity bill.
TurpentineEnjoyer@reddit
3090 is a 350 watt card you can undervolt to 300w and retain 95% performance, the 5090 is rumoured to be at least a 550w card, no information available yet on undervolt performance.
It all depends if the speed/available vram is worth the watts.
satireplusplus@reddit
You can also make it use 240w with 90% perf. It's consuming 350 watt so that it looks good in gaming benchmarks.
Distinct_Ad3556@reddit
575w actually 😂
unlikely_ending@reddit
This
salec65@reddit
My guess is they are already looking at potential 5080 ti and 5090 ti SKus that would be 24g and 48g respectively next year but at a higher cost and only if the market pushes them for it. Otherwise we'll likely see workstation gpus w/ 48/64gb come out in their place but at 5x the cost.
devnullopinions@reddit
They won’t because if you want inferencing they want you to buy their way more expensive professional cards.
2str8_njag@reddit
Guys, they will not give us more than 32 gigs of VRAM for a couple gens in GeForce. It's a gaming card and you all should understand this. There's absolutely no reason for Nvidia to do this, like at all.
ItsAMeUsernamio@reddit
They not giving enough VRAM to the gamers too, all the mainstream cards are stuck at 8 and 12GB.
entmike@reddit
8-12GB? What do you mean? The 5080 I thought has 16GB and isn't that mainstream? That's not the mention the 5090 having 32GB and 3090/4090s having 24GB.
noiserr@reddit
5080 will be over $1000. That's not really mainstream for a PC component.
ItsAMeUsernamio@reddit
The 80 cards were always considered enthusiast level, above which would be the Titan (renamed as 90 since the 30 series). The most popular cards are by far 50 and 60s according to Steam hardware surveys.
https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam
n8mo@reddit
At the price the 5080 is projected to land at? No, that will absolutely not be mainstream. Hell, I haven’t seen the steam hardware survey for this year, but I’d bet most people are still running 30 series and below.
You can build an entire PC (AMD GPU included) for the price of a single high end NVIDIA gaming card.
old_leech@reddit
This is where I disagree.
There IS a reason -- and that reason is skill adoption.
As a regular Joe, I can justify an expensive card that serves multiple interests. If I were in school and wanting to learn to stand up architecture, I could justify the investment for a card that allows me to both learn and game. Having access to the tools ensures adoption and normalization of those tools.
But it becomes a problem when the tools are paywalled to such a degree that it bars entrance.
It's why Adobe turned a blind eye to piracy and Microsoft never really cared if Bobby installed a bootleg Windows... they knew that availability kept their tools at the foremost of the people that were interested.
Things get more difficult when top (consumer) dollar is going to leave you feeling restricted out of the gate -- and this is where deep tinkerers will justify their time over their wallets. Necessity is the mother of invention, and people need to eat.
We are one major announcement of ROCm, oneAPI or something entirely new from a paradigm shift. I can't imagine a life where I wasn't bound to CUDA in one way or another... but man, I really, really want to.
Gearsper29@reddit
Thats not possible. For the last few years we are stuck with 2GB memory chips. So right now the only choice is 32GB vram. 64GB is technicaly possible if they sandwich the chips but this is the differentiating factor of the prosumer gpus so sadly they're not gonna do it for gaming gpus.
BuildAQuad@reddit
I'm just hoping Intel ends up doing this, getting loads of hobbyists/open source communities into using their cards can help drive adaption/compatability ect.
Gearsper29@reddit
Sadly the biggest intel gpu will have 256bit memory bus width so it will be limited to 16 or 32GB vram at best. The only hope for more are the 3GB memory chips that will come out later this year. So in 2 years from now we cound have a 50% increase in vram in every gpu category.
sassydodo@reddit
Gaming cards don't really need even 32gb vram, so "consumer" needs is a bit of a stretch here
Thrumpwart@reddit
The first GPU manufacturer to put 48GB Vram in a consumer GPU is going to make tens of billions.
moldyjellybean@reddit
Yeah man it’s like Apple selling fools 8gb and 256gb soldered ram for like 9 years. And charging $200 for 8gb extra or 1000 for 2tb of storage. I upgraded a thinkpad for $90 to 64gb ram and 2tb for $100. I sold off the original 32gb and 512gb so in all upgrading to 64gb ram and 2tb for less than $100
Educational_Gap5867@reddit
Personally as a gamer and an LLM enthusiast I’d prefer if they make 2 separate cards. One for DL and one for gaming. This will keep the pricing down for gaming cards as well as allow for DL cards to come with a lot more memory although maybe older and far fewer shader cores.
sleepy_roger@reddit
🤣 Well sir this is your lucky day!
Ancalagon_TheWhite@reddit
They do. It's called the A100 pcie and RTX A6000
Educational_Gap5867@reddit
But those are even more expensive lol
Ancalagon_TheWhite@reddit
Exactly. The gaming series are already the budget GPUs.
ThenExtension9196@reddit
On the other hand you can buy a RTX6000ADA workstation card for 8k. Has 48G of memory if you want it so bad.
Majinvegito123@reddit
Does anyone know what the performance gains are expected to be
haikusbot@reddit
Does anyone know
What the performance gains are
Expected to be
- Majinvegito123
^(I detect haikus. And sometimes, successfully.) ^Learn more about me.
^(Opt out of replies: "haikusbot opt out" | Delete my comment: "haikusbot delete")
mrSidX@reddit
Prithee, good sirs and gentle dames,
Doth any soul here knoweth well
What measure of swiftness or of might
We may expect to gain anon?
Hear me, ye gathered minds of wit and wonder!
Doth any ‘mong thee ken or whisper know,
What boon of swiftness, what gain of might,
May spring from yonder toil and tinkering?
For I am plagued with doubt most foul —
Shall our labors yield but smoke and sighs,
Or shall they blaze with glory unforeseen?
Speak, I entreat thee! What gains of performance
Doth destiny herself decree we reap?
-Majinvegito123
AsliReddington@reddit
Hardware FP4/FP6 support is expected to allow lower precision but retain higher accuracy for all models, letting you run large models like the 70B(with some offloading) or MoE models like Mixtral at very high speeds without quality loss of INT4.
emprahsFury@reddit
nvidia's ces keynote is tonight, so i guess we'll find out
Caffdy@reddit
About three fiddy
TurpentineEnjoyer@reddit
And is also guaranteed to cost over $2000, and likely $2500
We're rapidly approaching the point where business GPUs are looking like a bargain compared to the "prosumer" rip off.
TechNerd10191@reddit (OP)
Correct, but even $2500 could be almost half the price of RTX 5000/6000 Ada, not to mention the L40 ones.
Of course, if you need >4 GPUs, the business GPUs are a no-brainer.
TurpentineEnjoyer@reddit
I just don't have any hype for the 5090 at all given what we know about it so far.
Based on UK pricing it's likely to be over £2000 meanwhile I can get used 3090s for £650, so I could get 3x 3090s for the same price. 72GB vs 32GB. Eh.
It might be potentially twice as fast as a 3090 for processing, but I feel like once you're into the realm of *NEEDING* more than 10 t/s inference on 72GB then we're outside the realms of casual user and quite possibly into the realms of business users anyway, who'll be thinking about long term pricing of wattage vs upfront cost and picking the business hardware.
The 5090 is too expensive for hobbyists, and too inefficient for professionals.
It might have been an attractive option if it came with 48GB of VRAM but as it stands, it's just not really a good card in the LLM space, and I think that's exactly as intended by nvidia.
Nrgte@reddit
It's a gaming card. Gamers don't need to run huge AI models on their PC. If you want to do serious AI stuff get a A6000 or a A100.
katiecharm@reddit
As someone who owns a 4090 and would love a reason to be hyped about the 5090…. I’m just not.
By the time the PS6 generation of consoles comes out, there will likely be a 6090 card that has 64GB of VRAM and actually feels like a real step up.
Desm0nt@reddit
With probably Unified memory solution like Apple/AMD, not with dedicated card faster and bigger than PC card.
Caffdy@reddit
That's just 2 years from now, no way in hell they jump from 32 to 64 in just one gen
katiecharm@reddit
That’s how technology usually goes, and has gone for decades. Nvidia has been purposefully gimping this cycle but I don’t think they can do so much longer.
When one generation has 256GB of hard drive space, the next generation doesn’t come out with 290.
Usually it gets doubled.
Caffdy@reddit
I'm sure the next RTX 6090 is gonna be released with 32GB again. By 2030, maybe we will get 48GB with the RTX 70 series
noiserr@reddit
Bold of you to assume Nvidia will give us 64GB of VRAM in 2026. They've been stuck on 8GB for lower end models since 2017.
mxforest@reddit
Inference time compute is the future. Running a smaller model with higher tps is more important in agentic workflow than a bigger model with lower tps. 3x3090 will not only be slow because of a larger split up model, they will also be slow because of communication overhead. On top of that add the 3090s basically being out of warranty and potential for becoming deadweight whereas 5090 will be a safe "investment" because it can be RMAd.
TurpentineEnjoyer@reddit
Second hand stores, at least here in the UK are still obligated to give you a 1 year warranty - one store, CEX, offers 5 year warranty that covers GPUs, at no additional charge. I do realise however that will vary from location to location.
As far as inference speed being the future - in what context? What's the minimum acceptable tokens per second? Because I can hit 50 tps on an 8b model on a single 3090 - is 50 tps too slow?
Like I mentioned before it feels like once we're talking about the cost per minute of usage, we're outside the realms of consumer hardware.
mxforest@reddit
8b are on the border of what is usable. They are not good, just OK. A true agent will have multimodality with vision and speech input. That will increase this borderline to higher numbers.
Desm0nt@reddit
Well. 3x 3090 can run 3 small agents independently, what sounds better than 1 card with shared resources. And not everyone need multiple agents for... whatever you use it. Many people still use it as chatbots/RP/Coding assistant that work just as 1 agent and where 1 clever big model better than multiple stupid fast one.
PawelSalsa@reddit
I don't get it. Why do you need 100t/s instead of 10? Just for the sake of speed without any significant benefit in the term of usefulness?
ortegaalfredo@reddit
> 3x3090 will not only be slow because of a larger split up model, they will also be slow because of communication overhead and a low 900 GBps bandwidth
Incorrect. You can use tensor-parallel and you have effectively 3x the bandwidth and will indeed work faster than a 5090. In modern TP implementations like vllm, speed scale almost linearly with GPUs.
killver@reddit
Sharding sucks though for a lot of use cases and will always slow you down. This is why so many want and need larger memory on single cards, particularly for any training.
jd_3d@reddit
$1999
hyouko@reddit
...Are we? Like, if we're just going on the basis of cost per GB of VRAM:
Obviously you get other benefits/capabilities from the H100 and friends (not so much the RTX ADA series from what I've read...). But the consumer-focused GPUs are clearly the "bargains" here, insofar as there is any bargain to be had.
TurpentineEnjoyer@reddit
There's more to go off than just VRAM per dollar.
Supposedly they're built to a higher quality designed to withstand constant heavy loads, although I don't have any ten grand GPUs down the back of my sofa to put that to the test personally.
There's also dakka-per-watt, where pro cards have lower wattage for equivalent or superior performance which has long term savings over the lifetime of the card, especially in places where electricity is expensive now.
Then there's space constraints - how many PCI lanes you have, and literal space too, with a single 2 slot card having the same VRAM as 2x 3 slot cards.
It's also just a constantly climbing price because nvidia has no real competition right now. There's no reason a $2500 card couldn't come with 64gb, other than that it would cannibalize their server card market. A 64gb card for $2500 I'd be tempted, but as it stands right now I can get 3x 3090s for less money and have 72gb
hyouko@reddit
Definitely there's more than just VRAM; if you're doing anything that requires interconnects then the pro-grade hardware is your only option. But your initial premise was that business GPUs are looking like a bargain, and then the example of a bargain you actually give is a trio of consumer cards from 2 generations ago. I think that's the better way to frame it: for many use cases, the value proposition isn't really improving from generation to generation in the way that it might if nVidia had actual competition.
(I don't think we will see a 64GB card, but maybe it's just possible that we could see a 48GB 5090Ti at some point, with those 3GB DDR7 chips that are supposedly used in the 5090 laptop chip?)
TurpentineEnjoyer@reddit
My reason for believing business cards are starting to lookm like a bargaon comes from:
https://www.reddit.com/r/LocalLLaMA/comments/1hv1efu/comment/m5pk8j3/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Admittedly a different comment to the one you're replying to but justifies my reason for pointing out why 3x3090 is better than 1x5090.
tldr: 3090 is plenty fast enough for hobbyist users. If you need more than 10 t/s on a model that runs in 70gb of ram, then you']re probably a business user anyway.
5090 can'[t justify the price for hobbyists, but pro users will pass it up for pro cards too.
emprahsFury@reddit
What's honestly crazy is that AMD and Intel also sell their gpus for that cost, and it's literally worse in every way.
Having said that you can get an ebay "new" w7900 (48gb 7900xtx) for $2500 which does alter the math.
Natural_Home_769@reddit
I still remember having an entire system built with gtx 1070ish under 1000usd.
Now the gpu itself costs over 3000usd at least in my Curryland. Can't even afford it as a 9-5 IT worker. 3000usd is like 12000usd(PPP adjusted)
RTX 3090 still costs around 2800USD here.
katiecharm@reddit
Don’t feel bad eastern bro. Us over here in Burgerland can’t afford it either.
Natural_Home_769@reddit
Thanks mate for the generous word I feel really bad :(
Affectionate-Cap-600@reddit
same here... pizzaland
Vast-Breakfast-1201@reddit
My 1080ti system was about 1500 all in. Half that was the GPU.
Inflation at 7 years would put that around 1900 bucks. Instead the GPU costs 2000-2500 or whatever and it isn't even Ti or super, so it's a level down on a per-generation basis. Plus whatever else to build the thing. And not counting any tariffs that will get laid down.
Tbh we need an economy crash pretty quick here. We have gone too long without, so nobody has any idea what the actual price of anything is.
ThenExtension9196@reddit
I’ll pay no problem for a 5090. Nothing else like it. It’s the workstation cards that hurt.
AppearanceHeavy6724@reddit
3x5060 could be a bit more interesting proposition.
BuildAQuad@reddit
Probably not costing the same as used 3090s tbh
AppearanceHeavy6724@reddit
why? 2x 3090 $1500, 3x5060 probably 1200 or so.
Xyzzymoon@reddit
2 4060ti 16GB currently costs more than 1200 already , why would 3x 5060 in any variance be cheaper, if it comes with anywhere near 16GB VRAM?
AppearanceHeavy6724@reddit
Because otherwise no one buy x060 card for more than 400 usd. Pay attention, not Ti but vannila 4060 is $300. So I expect 5060 16 to be 400. Ti more expensive not because of memory, but due to faster chip.
yamosin@reddit
Not good news for LLM, because LLM's VRAM size is far more important than bandwidth.
But it’s good news for stable diffusion users. The improvement of it/s is very exaggerated.
TastesLikeOwlbear@reddit
I will not be at all surprised if the 5090's 32GiB is partitioned in some way designed to limit nontrivial AI usage.
I.e., they'll say that it's got 24GiB for graphics to match the 3090/4090 and then an extra "AI dedicated" 8GiB "to support having AI models in games without impacting graphical fidelity!"
And then they'll do something dastardly in the hardware to hamstring using the full 32GiB for AI so it doesn't cut into their lucrative workstation and datacenter business.
I look forward to being wrong about this.
segmond@reddit
From my calculation it's going to be about 2.5x-3x faster than the 3090 for inference. So the question is, would you prefer 3-4 3090's 72gb-96gb of vram vs 32gb. For huge models, more 3090 will beat partial 32gb of 5090. Models have gotten insanely large, llama you need at least 70b, mistral 100b, deepseek 600b+, grok 300b+, etc
Hialgo@reddit
I don't caaaaaaare I just want the VRAM
CystralSkye@reddit
The 5090 isn't made for AI it's made for gamers.
synn89@reddit
At this point I could care less about raw speed. I really just want something at a 3090 level with way more VRAM, moderate power draw and is reasonably priced.
CystralSkye@reddit
You do understand that you are asking for a gaming card to be something it isn't?
There is a huge demand for a card like that, and it exists, you aren't going to find a method to get it cheaper. The cheaper version is the 3090.
You need to wait a couple of years until demand runs low.
SandboChang@reddit
I really think this is a gap that AMD or Intel can try to fill up. Not sure how big a market local LLM really has at the moment though, might just be too small to bother.
barefoot_twig@reddit
AMD and Intel would be much more interested in eating into the enterprise AI chip market. If they can, then the prices could become somewhat reasonable for the LocalLLM crowd, but don't hold your breath.
Hasuto@reddit
Both Intel and AMD are making enterprise AI chips similar to Nvidias offerings.
They are slightly cheaper (but still not purchasable by normal people) and usually slightly worse than the Nvidia offerings.
I find it doubtful that either of them will be more interested in offering consumer grade chips that outclass their enterprise offerings than Nvidia is.
emprahsFury@reddit
that's an amd w7900, which is a 7900xtx w/ 48gb
barefoot_twig@reddit
If that were an option, everyone would want it as well. What you're describing is an A100 80GB, and they're still >$10k used for a reason.
psychicsword@reddit
We need those cards to start to go end of life so they start to appear on the open market like the Tesla K80 cards
iamthewhatt@reddit
lol as big as AI is right now, EOL cards will still be ~$10k. 4090 has been out for for 2 and a half years and it's price has never dropped below MSRP anywhere, for this very reason.
Wrong-Historian@reddit
2x 3090 in Tensor Parallel will give you \~1.6x the inference speed of a Single 3090 (eg \~ '1.5TB/s'), cost \~$1400 and have 48GB.
3090's remain king.
mxforest@reddit
I really think that Nvidia has a trick up their sleeve which nobody knows yet. They have a 3 way split power delivery system as per leaks while the 4090 has 2. They have put in some other AI components that require separate power. If it is relevant for inference time compute then this changes the efficiency numbers.
Affectionate-Cap-600@reddit
interesting...thank you, I misses that!
TurpentineEnjoyer@reddit
What are you using for tensor parallelism? I've heard VLLM and one other I've forgotten.
bick_nyers@reddit
EXL2 quants run pretty fast split along 2+ cards. Available via Oobabooga or TabbyAPI.
TheGuardianInTheBall@reddit
I love how every piece of software related to AI sound so silly.
Oobabooga, TabbyAPI, HuggingFace, ComfyUi, Ollama.
Peak Camelot.
iamthegemfinder@reddit
I feel like the prevalence of these sillier/more abstract names must in part arise from the fact that naming your ML project/company any more seriously, and standing out at all, is a challenge—there are only so many permutations of “chat” “web” “ui” “agent”, and other such things, the pool of relevant terminology in this area is so lacking 😭
Or the secret third option which is naming your thing some Sleek Compound Word like idk Fireflow or Readychain or something. Only do this if you are interested in shipping an overstimulating landing page overdosed with parallax scroll effects and fading into obscurity immediately.
I may be reading too much into this.. :P
ArsNeph@reddit
Speaking of silly, you forgot SillyTavern XD
TheGuardianInTheBall@reddit
I wasn't aware of it, but looks like fun, might spin it up.
TurpentineEnjoyer@reddit
It kind of advertises itself as a roleplay focussed web UI but honestly, the features it provides are pretty powerful for a general purpose web UI too.
You can add context blocks that can be turned on and off, have multiple characters with different personalities interact in a single chat, write your own plugins that utilize function calling, etc.
TheGuardianInTheBall@reddit
Funnily enough, I've been thinking of doing something like this myself for a little tabletop RPG I'm developing.
TurpentineEnjoyer@reddit
ooba is what I'm using right now, but as far as I understand it isn't true tensor parallelism.
I'm loathe to leave ooba since I love how easy it is to switch between models and get set up, but a 60% speed boost is tempting.
bick_nyers@reddit
In the EXL2 loaders there is an enable_tp option.
TurpentineEnjoyer@reddit
Ah interesting, I'll give it a try.
I've been using GGUF for so long that I forgot ooba can actually do other things too
Wrong-Historian@reddit
mlc-llm!
TurpentineEnjoyer@reddit
That's the one! Thanks
koalfied-coder@reddit
Historian here is the facts!
Healthy-Nebula-3603@reddit
Splitting LLM onto 2 cards is not 8neeasi5speed model .. works with the same speed like you'd run on 1 card but with extra vram
Pretend_Regret8237@reddit
Stop with the unfounded rumors, we have a few hours left to the reveal. This is clickbait
teh_mICON@reddit
I'm fucking over the "rumors" for months now
custodiam99@reddit
I feel that GPUs are getting irrelevant in 2025. We will need a powerful CPU and 512GB DDR5 memory instead.
__some__guy@reddit
No good mainboards for Epyc 9005, and it starts looking like 12-channel may still be too slow.
Thomas-Lore@reddit
Might be, if there is more models like Deepseek v3.
ForsookComparison@reddit
7900xt blowers go brrr for $699
iamthewhatt@reddit
man if AMD actually had a CUDA competitor, they would be rolling in the dough... Out of all their billions in profits, you would think they would put together a team to tackle that.
virtualmnemonic@reddit
The problem is that todays software - having billions poured into it - is in CUDA, so even an equalivant CUDA competitor isn't enough. They've tried to address this with solutions like ZLUDA, but there's a performance penalty and/or bugs. NVIDIA has been investing in CUDA for years and it's really paid off.
ForsookComparison@reddit
They don't even need a competitor. They just need Lisa Su on stage in front of a road map for ROCm or some adoption that would make them viable for training.
But they're not doing that and ROCm is growing but at a relative snails pace. I don't get it. Is their strategy really to be the inference company?
iamthewhatt@reddit
I am not that learned in the alternative technology spaces, but the reason why I say "competitor to CUDA" is mostly because of the popular software that uses CUDA, which could also be using ROCm equally were AMD to support it. Or some other technology that leverages existing infrastructure to mimic CUDA (like ZLUDA but less legally ambiguous). Here's to hoping they announce something today 🤞
estebansaa@reddit
Not interested unless I can run DeekSeek at a descent level.
creo_one@reddit
Behold, even more useless data being sent to GPU for no reason
Echo9Zulu-@reddit
I mean what consumer hardware could possibly support utilizing that memory bandwidth?
un_passant@reddit
Can p2p be unlocked with a custom driver like for the 4090 ?
If NVIDIA fixed that hole, I'll pass.
CockBrother@reddit
If there's an A6000 Blackwell with 64GB of RAM then we're really starting something interesting. It'd probably cost about $10k though.
The 24-32GB jump in consumer space is "nice" but doesn't change anything dramatically. You can probably get two consumer cards in a regular PC without fuss. Anything more than that and you're looking at replacing the ridiculously thick heat sinks (dangerous) or PCIE extenders and related goofiness.
I don't need a five slot thick GPU that needs an additional slot of clearance for airflow to make me feel good about how heavy and bulky it is.
pointer_to_null@reddit
That's a reasonable guess. Judging by predecessor's naming, predict it'll be officially named "RTX 6000 Blackwell Generation". The "RTX A6000" and "RTX 6000 Ada" brand confusion was comical and felt like it was intentional though.
But I agree with the other predictions, with one caveat: Nvidia will MSRP on their website below $8000, but those will go immediately OoS while remainder will be sold at ~$10k from partners like PNY.
Biggest_Cans@reddit
PHENOMENAL COSMIC POWER
^^itty ^^bitty ^^living ^^space
garbo77@reddit
For the RTX 3090 budget I would probably go for RTX A2000 ADA 20GB, 70w: am I in wrong?
TechNerd10191@reddit (OP)
The A2000 has 12GB or 16GB. An A4500 (20GB and 320 bit memory bus) would be a better alternative.
garbo77@reddit
My mistake, I meant A4000
TechNerd10191@reddit (OP)
A4500 > A4000 (check the Nvidia datasheets for each for proof)
joninco@reddit
That'd be a bad decision. 3090 has 2x the memory bandwidth, 3x the tensor and cuda cores.
Terminator857@reddit
I like how the box says "low noise". The 3090 gets loud under load.
treksis@reddit
time to upgrade
Calcidiol@reddit
One thing I don't understand fully is what's the difference technically and in terms of SW / whatever limitations between flagship consumer gaming GPUs like 4090, 5090 and the features supported by their pro. GPUs.
Specifically wrt. the topic of memory, do these class of GPUs (3090, 4090, 5090 guesstimate) have the same level of memory integrity / reliability wrt. ECC and means of EDAC as the comparable pro. GPUs?
I vaguely recall some things about some (all?) consumer GPUs maybe lacking the same architecture of ECC (maybe they literally don't have the extra bits). Also similarly I may have seen some comment about some kind of in-band EDAC / ECC scheme which functioned on some NV pro. GPUs but which might simply be disabled from effectively working on the consumer ones?
Again IDK the status quo / history which is why I'm asking, but I've seen enough confusing things that I'm motivated to ask since I could believe "either way" that there is or is not some parity (pun intentional) of memory integrity between these classes of carts.
And as a PS for those who have been running 3090, 4090 hard doing GPGPU experiments / whatever, what's your impression of the overall reliability of these things at all levels?
In the worst case one hears about melting power connectors (et. al.) and missing MLCC instability fiascos and one has to wonder how much to trust these. Going over $1-$2k with some mediocre 3y warranty seems well out of "disposable toy" range to me and more like "long term infrastructure" I'd want to trust for 10y time scales and a good lot of the "board vendors" I'd hesitate to buy anything from in general (customer service, SW support, ...).
shokuninstudio@reddit
Who can do price and specs estimates for the RTX 6000?