The RTX 5000 PRO (48GB) arrived and it is better than I expected.
Posted by Valuable-Run2129@reddit | LocalLLaMA | View on Reddit | 150 comments
I posted here about buying it a few days ago: https://www.reddit.com/r/LocalLLaMA/comments/1t2slmw/first_time_gpu_buyer_got_a_rtx_5000_pro_was_it_a/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Before pulling the trigger I was leaning more towards a Mac Studio. But the the prompt processing speeds I was reading about were giving me pause. The budget was $5000/6000. So the 256GB was out of the question.
I gambled and bought the RTX 5000 Pro. With ZERO experience with PCs, how to build them, what parts to buy... It was a good deal. I paid $4300 for the gpu including taxes (in the post I wrote 4700 in the comments, but I was mistaken, I checked the receipt) and had to buy everything else for the computer. It ended up costing $5600 in total with 64 gb of RAM.
Assembling the thing was not easy for me as a total novice, but thankfully we have LLMs to guide us through these things.
Then came Linux and vLLM... Honestly I was totally lost. without Claude Code it would have been impossible. Also what settings to use to run Qwen3.6-27B-FP8 with full precision cache. Thankfully this guy posted everything I needed to know to tell Claude what to do: https://www.reddit.com/r/LocalLLaMA/comments/1t46klu/qwen36_27b_fp8_runs_with_200k_tokens_of_bf16_kv/
After burning through 50% of my Claude Code Max 20x weekly limits the thing now works, and I have to say... I made the right call. This thing rocks.
I'm getting up to 80 ts in TG (more like 50/60 for very big prompts) which is phenomenal. But most importantly I'm getting 4400 tokens per second in PP!
The full precision cache fits only 200k tokens, but It is totally ok for me.
I honestly don't know why people are not talking about this gpu more. It costs just 1000$ more than an RTX 5090, it can fit 27B at 8FP and 200k of context at full precision. It draws half the electricity... Sure it is slightly less performant, but the numbers I'm getting are way more than I was expecting. Two 5090s would definitely beat this. But it would cost significantly more, it would be crazy noisy and tear a hole in my pocket in electricity bills.
Guilty_Rooster_6708@reddit
Didn’t realize you can get a 5000 Pro for $4300… my girl is going to be so mad..
cruisereg@reddit
If she’s not your wife, trade her in for a better model ;)
Myarmhasteeth@reddit
Lmao I was just thinking something similar, after just having a 3090 I was looking for an upgrade… this post just gave me what probably would be the next one I will get
OutlandishnessIll466@reddit
Am also looking to replace my last P40. Already have 3x 3090. Now I could get a 4th 3090 turbo if I can find one, but I am also looking at the pro Blackwell's. At first glance the 5000 48gb looks tempting but at half the cuda cores of the 6000 and similar memory bandwidth as a 3090, I am afraid it won't be much faster then the 3090s at anything but native fp8. The VRAM is nice but I already have more then enough for qwen 27B and there are no other interesting models in the 122B range currently. I want something faster then a 3090 so better to save for the 6000 pro.
Valuable-Run2129@reddit (OP)
If you have the budget, go for the 6000. Otherwise the 5000 is a great deal. The electricity savings and noise reduction over two 3090s is significant.
Two good 3090s will cost you $ 3000 today. The 1000 dollar difference is paid for by the electricity bill
Ill_Initiative_8793@reddit
Your best option is second 3090
Myarmhasteeth@reddit
Yes absolutely, the prices are just too high…
Guilty_Rooster_6708@reddit
People really do anything to get better PPs!!
Turbulent-Cupcake-66@reddit
Why you need 64GB of ram? I guess you will not offload any byte of model to ram?
MathmaticallyDialed@reddit
Nemotron has a million token context…
Orlandocollins@reddit
Yeah its just not competitively priced relative to the pro 6000
Bubbly-Staff-9452@reddit
Yeah that was really the only thing keeping me from getting one, because if I could justify it I could justify a 6000 lol. The 72Gb version is even worse of a value compared to the 6000.
Valuable-Run2129@reddit (OP)
I paid 4300 dollars. the RTX 6000 cost twice that. Can you justify double?
mrgalacticpresident@reddit
Don't let people talk you down. You can always upgrade.. and the RTX-6000 IS an upgrade. But in the end money is a resource and for hardware often good enough is good enough.
48GB is actually a sweetspot for the moment to run some 8 bit Qwen models - which in my experience improves a lot of the tool calling confidence you get.
As long as local LLMs are 1-2 years behind frontier models and frontier models can run in the trillion parameters area in the cloud and they providers heavily subsidize tokens.. => there is no way local LLMs are a reasonable investment unless it's for research, easy access or privacy concerns.
smb3something@reddit
Privacy is a big one. But yeah once the token gravy train runs out, it's gonna be mad what the real cost of the cloud services are. Renting the capacity to run even 48GB VRAM is a lot per month, you'd pay for the card in a year or two.
Void-kun@reddit
When it reaches that point I'll switch to local.
But buying hardware now to prepare for that is a gamble. In the time that's subsidized, better hardware may come out.
Rather just hold on and milk the subsidised API subscriptions as much as I can then switch to local LLM.
Hopefully we start to see hardware prices come down in the next couple of years. Not very hopeful but what else can we do?
Bubbly-Staff-9452@reddit
I’ve seen the 6000 for 8300, so less than double. And not only that but more VRAM is generally a premium not a discount. And it’s the only one of the workstation cards that can perform on par with or exceed a 5090 so it’s literally a do everything card. That’s why I couldn’t justify the 5000, because you can do a lot more with the 6000 for less than double.
Valuable-Run2129@reddit (OP)
I wanted to run a 30B model with FP8, full precision cache. I don't need the card for gaming. I don't game.
Sure the 6000 might have given me 120 ts TG instead of 80. And 7000 ts PP instead of 4500... But at twice the price? It made no sense.
And an RTX 6000 doesn't really enable better models. I would need two RTX 6000 to run an actual step up from what I have now.
Plus the noise and the electricity bill.
xcel102@reddit
For me, 48GB is the sweet spot ... for 1 LLM.
96GB would have enabled me to bring up another model (say a VLM for image/video analysis). For that, I would've loved to get the 6000 if I had the budget. But alas, I don't. So I got the 5000 like you did 😀 Still waiting for a PC setup to power it up (Beelink eGPU dock).
Worldly-Plastic-2516@reddit
People have different use cases. For you it made sense, and that’s great!
I have a 5090 for different reasons but it makes more sense for me.
Sofakingwetoddead@reddit
You got a great GPU that's giving you the functionality you needed at better than expected speeds, and a price point that's p good in this current market. Congrats. I went with a 9700 just to have something to play with and when I see cuda speeds I am def a little jelly. 😃
Bubbly-Staff-9452@reddit
I mean that’s great that you have it, I’m not trying to put you down. I was just replying to the other person that it isn’t competitively priced and compared to the 6000, it isn’t it. I’m not even getting another Blackwell card, I’m holding off for another until workstation Rubin cards come out, and I’m just a random anyways so don’t take my opinion as fact just because I said it.
DAlmighty@reddit
Where have you seen a Pro 6000 for 8300? My wallet may hate you but I don’t.
RobotHavGunz@reddit
on sale at Microcenter right now for $8700 - On sale... Tempting... https://www.microcenter.com/product/694549/pny-nvidia-rtx-pro-6000-blackwell-workstation-edition-dual-fan-ai-workstation-graphics-card
FineManParticles@reddit
Got mine from the center for $7999 (MaxQ) and signed up for their card which gave me $800. For $7200 I think I’m good since it’s for sale $9200 now open box.
AlwaysLateToThaParty@reddit
I bought my 6000 pro by salary sacrificing aka paying for it through before-tax earnings. In Oz, it cost AUD$14500, which got reduced to $8250 out-of-pocket costs. That's about $5K US?
Bubbly-Staff-9452@reddit
It’s been a few weeks since I last looked, I think it was at Central Computers.
DAlmighty@reddit
You may have been thinking a few months vs. few weeks.
panchovix@reddit
6000 PRO does exceed a 5090 on gaming or PP speed.
Now the thing is, 6000 PRO is "justifiable" if you do it for AI (LLMs, or diffusion training/inference), but for gaming I don't think someone gets one just to surpass a 5090, right?
nagareteku@reddit
thrownawaymane@reddit
new Star Wars plot just dropped
IrisColt@reddit
Right? Right?
Long_comment_san@reddit
you were saying? oh you weren't
YOU_WONT_LIKE_IT@reddit
Link? An actual Blackwell pro 6000 for under $9k?
Anonymous_Prime99@reddit
Got the same 5000 48GB. Thought the same thing you did.
Fast forward a couple months, about to pick up two 6000 MAX Q's. The 5000 puts you in a league where you are above average, but just a hop away from the next level. If you outgrow what your 48GB can do, you might feel the same.
Valuable-Run2129@reddit (OP)
The issue is that the only jump up from the 5000 that makes sense is two 6000. A single 6000 doesn’t really let you use significantly better models.
And 2 6000s are not compatible with apartment living in a big city. I am pretty much stuck
vtkayaker@reddit
Don't worry, seriously. The RTX Pro 6000 is a fine piece of hardware. But you're 100% right the the RTX Pro 5000 is also excellent, and people don't talk about it enough.
zipzapbloop@reddit
"buy more (vram) save more" - jensen huang
Valuable-Run2129@reddit (OP)
wdym? it costs less than half. look at any actual listing with availability
grabber4321@reddit
lowest price ive seen is 6999CAD from newegg
clairenguyen_ops@reddit
We hit the same routing problem when Anthropic had that capacity wobble in March. Ended up putting a gateway in front (we use Bifrost, LiteLLM and Portkey are both fine too: https://github.com/maximhq/bifrost) mostly so the fallback logic lived in one place instead of being scattered across four services. The retry-on-different-provider bit is what actually saved us, not the unified API.
alexp702@reddit
Man buys 4300 dollar gpu - surprised it’s good. What times we live in!
TacGibs@reddit
"What a time to be alive !"
Valuable-Run2129@reddit (OP)
“Hold on to your papers”
toptier4093@reddit
Laughs in Mac Studio
Ill_Initiative_8793@reddit
Laughs in 48GB 4090
toptier4093@reddit
Cries in second mortgage
More-Curious816@reddit
they need to make workstation edition for professionals. the current version can't get more compute because it can't draw more electricity because the small form factor can't handle the heat.
double it's physical size, and double the electricity, add 2 ultra chips inside and it will sell like a hot cake.
techlatest_net@reddit
Congrats on the build! That's a serious setup and those numbers sound fantastic—4400 t/s PP is no joke. Totally get the hesitation between Mac and PC but for local LLMs the flexibility (and raw VRAM) of a workstation GPU is hard to beat. Glad it all came together even with the Linux learning curve. Enjoy the speed!
neo123every1iskill@reddit
What’s TG? What’s PP? Damn acronyms.
Valuable-Run2129@reddit (OP)
Token generation and prompt processing.
neo123every1iskill@reddit
Thanks
jacek2023@reddit
"I honestly don't know why people are not talking about this gpu more" probably because RTX 6000 Pro
I still think 5090 is just a bad choice but people buy them for some reason
FullOf_Bad_Ideas@reddit
dense compute and you can stack multiple of them to get VRAM too.
3 5090s have more total compute than single RTX 6000 Pro at similar price.
5070 Ti is best compute per dollar but you'd need 2x more of them so it gets kinda annoying to do.
RG_Fusion@reddit
Sure, but how are you going to power them? You only need two 5090s in a desktop to hit the power limit of a home outlet. That's why these professional cards are better for LLMs. I can run a server with 4 RTX 4500s along with the base power needs of a server and still run on a single 120v outlet. The RTX Pro 6000 max-q only needs 100 watts more per card than the 4500.
FullOf_Bad_Ideas@reddit
I run 8 3090 ti's from one outlet, I don't live in US and have up to 3680W per phase and it didn't cross my mind as an issue for stacking 5090s - I'd probably get 8 and undervolt them and run around 400w each.
Max-q sucks compute wise compared to 6000 Pro, I'd rather undervolt 6000 Pro or 5090 when needed.
When I tested compute with MAMF benchmark, 6000 Pro Max-q had average of about 306 TFlops, 6000 Pro WS had 392 TFLOPS, 5090 had 240 and RTX 5000 Pro had 235 TFLOPS. So max-q loses 25% of the perf even though it's close to WS in pricing.
Schneller52@reddit
5090 is a bad idea for just LLMs. But a good fit if you do other things with your PC.
jacek2023@reddit
I use 5070 for "other things with PC", what is your usecase for 5090?
Schneller52@reddit
Do you use your 5070 as your primary LLM GPU?
jacek2023@reddit
no, I use 3x3090 as my primary LLM GPUs, I use 5070 for desktop and for tiny LLM tests (like Qwen 35B Q4)
Schneller52@reddit
Pretty much my point lol.
Previous_Feeling_484@reddit
Like warming your house /s
Schneller52@reddit
In a sub where people commonly stack multiple 3090 furnaces, I find that kind of funny lol
ProfessionalSpend589@reddit
Or warming the planet.
graypasser@reddit
What's so bad about 5090 tho? from price to memory to computation it's number is not that bad.
Freonr2@reddit
5000 Pro: 14080 cuda cores, 1.34 TB/s
5090: 21760 (+54% from 5000 Pro), 1.8TB/s (+34%)
6000 Pro: 24064 (+11% from 5090, or +71% from 5000 Pro), 1.8TB/s (+0% from 5090)
I don't think it is all that clear.
notdba@reddit
For hybrid GPU/CPU inference of large MoE models, a single 5090 is the best choice to get good PP. 6000 Pro has slightly better PP but it is much more expensive.
panchovix@reddit
5090 at MSRP makes a bit of sense IMO, but above 3K USD it just doesn't. And I say this while having 4x5090 (which I love btw) and a 1x6000 PRO.
In theory the best cards for VRAM/price and NVIDIA would be RTX 4060 Ti 16GB/5060 Ti 16GB.
Moscato359@reddit
But are 5060 ti actually any good at ai processing?
panchovix@reddit
They are decent, like a 2080Ti/3070 on performance for compute. A 5070Ti is noticeably faster but also a good amount more expensive.
Valuable-Run2129@reddit (OP)
the RTX 6000 Pro cost exactly twice as much as I paid.
popecostea@reddit
All the more reason to buy another RTX PRO 5k.
DAlmighty@reddit
Or sell the RTX Pro 5k and buy a Pro 6k
burdzi@reddit
Yeah and don't forget - 5090 existed before 5000 RTX pro. I bought mine before 5000 was available 😅
letsbefrds@reddit
4300 is a good deal I've been going back and forth 48gb 72gb or suck it up 6000 pro lol
Valuable-Run2129@reddit (OP)
If you have the budget go with the 6000. The 5000 72gb doesn’t make much sense though. The price is too close to the 6000.
letsbefrds@reddit
Haha I wouldn't be flipping and flopping if I had the budget for the 6000
It's 4399 at microcenter I'll probably just sell my 7900xtx and pick it up. Glad to hear you had a good experience.
Low_Twist_4917@reddit
I’d honestly stretch the budget for the 6000. I know that sounds like a privileged thing to say in some sense but I really wouldn’t spend almost 5k on a GPU with half the vRAM of one u can get for 8.5k at Central Computers.
letsbefrds@reddit
Aren't they like 8899 there?
I can stretch the budget... We're all privileged to throwing down this amount of money that can literally feed someone a year across the world...
I definitely get where you are coming from though, the issue for the 5000 isn't the price it's that it's slower than the 5090 if it was the 5090 with more vram I woulda snatched it up already... But coming from a 7900xtx it's going to feel lightning fast regardless...
Low_Twist_4917@reddit
I could’ve sworn it was 8399 a few weeks ago. I picked mine up from MBPC for around that. You’re 100% right though.
And yea I totally get the definite increase regardless of which card you get. I ran a 5090 before upgrading to multiple 6000’s and to your point - the lack of vram made inference INSANELY slow. I will say that since switching to the rtx 6000’s it hasn’t been an issue. I will also note that imho going from 32 to 48 isn’t that big of a jump for local inference.
TechnologyGrouchy679@reddit
💯
Low_Twist_4917@reddit
I’m running rtx 6000 pros. They’re the best card you can get for local inference period.
TechnologyGrouchy679@reddit
💯
__JockY__@reddit
Hey, you did it! Awesome! Glad that post of mine helped out.
The 5000 PRO is a great GPU… now… placing bets on when your 2nd one gets ordered…
Valuable-Run2129@reddit (OP)
Thanks again for sharing all that info!
I’m now trying to store prefixes on RAM (so I can juggle 2 or 3 contexts without reprocessing), but had no luck. It seems to be incompatible with some of the settings. How would you go on about it?
__JockY__@reddit
I haven't tried, but my understanding is that LMCache is the way to go with vLLM.
fasti-au@reddit
Llama.cpp Tom quant and you can stack extras the 27b architect a 35b as main coder and a 9b doco trailer park girl worker redis
Turbulent-Week1136@reddit
RTX 5000 pro seems more like a mem-maxxed 5080 rather than half of a rtx 6000. I just picked up an RTX 6000 earlier this week for around $8300 so I will be playing around with that this weekend.
RG_Fusion@reddit
Yes, the RTX pro 6000 is undoubtedly the better card, but most people can't reasonably take out a loan for a GPU.
JohnToFire@reddit
How's the blower fan noise at idle and at speed ? Thats why I could not choose an rtx 5000 and instead was choosing between a 5090 and a 6000
RG_Fusion@reddit
I have a pro 4500 and can also confirm there is zero noise at idle. Even at 60% usage they are very quiet.
Valuable-Run2129@reddit (OP)
At idle no noise at all. The cpu fan is much louder.
kartblanch@reddit
I loaded 120b today on my machine 5090 + 64 gigs ram. Ran fine. Why do i need to spend more????
Valuable-Run2129@reddit (OP)
Because you can’t run Qwen 27B FP8 with unquantized cache. Quantized cache sucks hard
simotune@reddit
The underrated part here is the lower hassle per token, not just the raw speed. 48GB with sane power/noise and enough context sounds like a way nicer daily-driver setup than people give it credit for.
aesu24@reddit
And now Hermes?
Valuable-Run2129@reddit (OP)
I made my own harness: https://github.com/permaevidence/LocalAgent
It feels quite awesome
Select-Reporter5066@reddit
The 4400 tok/s prompt processing is the real buried lede here. Everyone argues raw t/s, then a 200k context prompt shows up and suddenly the boring workstation card is wearing a cape.
Valuable-Run2129@reddit (OP)
Having a big PP it’s all that matters
Slowdive91@reddit
I'm not convinced it's worth the cost for local models.
Puzzleheaded_Base302@reddit
would you mind to confirm the idle power draw of RTX PRO 5000 ?
Organic-Thought8662@reddit
Mine idles at around 16w.
BAL-BADOS@reddit
I can’t afford paying almost $6000 for a RTX 5000 Pro 48GB since I can’t make that money back.
I had to settle for $1800 Mac Studio Ultra 64GB. While’s it’s nowhere as fast as the RTX I just leave it on while I do other task.
JayTheProdigy16@reddit
Just so people know as of early 2026 there was a revised 72gb variant of the RTX PRO 5000 Blackwell which i was lucky enough to catch at my local nicrocenter for about $6,600 which is decent for post RAM-pocalypse prices as far as i could tell but there seems to be very little info on the 72gb card actually out there online. Anyways running that alongside my 3090 to bring my rig to 96gb VRAM + 128gb Strix Halo, very lovely.
super1701@reddit
I was thinking of doing a Strix with 4080s for prefill. Not sure how much value that would be. I have duel RTX 8000s currently, which price to performance I will not complain.
Draco32@reddit
What software stack did you run for this?
JayTheProdigy16@reddit
I use Proxmox on the Strix with an Ubuntu 24 VM and all 3 GPUs configured for passthrough to that VM. Inside that, Llama.cpp built with CUDA + Vulkan, ive used ROCm before but i found Vulkan to he faster for the Strix. Also ran into a weird compatibility issue between blackwell and Strix (that did NOT occur with Ampere x Strix) with CUDA ops that would crash Llama.cpp so i ended up using codex to create a custom patch to support those ops and now it works flawlessly.
ProfessionalSpend589@reddit
Did you attach the 2 GPUs on the same Halo?
I meant to post a question on r/StrixHalo for sometime if anyone was running it with 2 GPUs, but keep forgetting.
JayTheProdigy16@reddit
Yea, RTX PRO via M.2-Oculink and 3090 via Thunderbolt
egudegi@reddit
the 4400 t/s prefill is insane and nobody talks about it. everyone obsesses over TG because that's what you feel during a conversation, but if you're doing anything with long context, RAG, or batch jobs that PP number is the one that actually matters. and this card just obliterates consumer GPUs there.
also the electricity math is real. two 5090s running hot 8 hours a day adds up fast. this thing is basically a server GPU at a consumer-ish price point and people are sleeping on it because it doesn't have a flashy gaming brand attached.
good write-up, more people need to see actual real-world numbers from someone who just built their first PC and got it running. refreshing vs the usual "here's my theoretical benchmark" posts.
MisticRain69@reddit
Yes I like my strix halo but good god the PP is so slow. Even qwen 3.6 27b q8 with a 3090 EGPU which took my TG from 6.7tk/s to 14tk/s and with MTP its now 22tk/s-36tk/s depending on acceptance rate the PP is very slow. 600tk/s with no mtp and 300tk/s with MTP. It takes ages to process larger prompts especially if something invalidates the 70k token KV cache.
SkyFeistyLlama8@reddit
Welcome to the same nightmare unified RAM users have been facing for years now. PP is dead slow compared to Nvidia GPUs, like I can wait minutes for a long prompt to process before the first token is generated. And yeah, woe to me if something invalidates the KV cache because I'll have time to go make another cup of coffee.
MisticRain69@reddit
From what ive seen the spark has about the same TG as the strix but much more pp. Like for minimax m2.7 the strix on vulkan peaks for me at 200tk/s pp while on the spark I see the average being 2000tk/s or more.
finevelyn@reddit
I don't think it is. The 5090s are either going to be faster at a higher energy consumption, or if run at a similar t/s then going to be similar in energy consumption as well. The 5000 pro just has a lower ceiling in both performance and energy consumption, but the 5090s can be configured to match it if you don't want to run them at peak performance.
voyager256@reddit
4090 and 5090 are consumer cards (I guess except for the latter’ inflated price) and 5090 i is basically an RTX Pro 6000 with "only" 32GB VRAM. So significantly faster than RTX Pro 5000 IF the model fits the VRAM .
human_bean_@reddit
Prefill on consumer cards is also quick, and most people should both undervolt and heavily power limit them with next to zero cost.
Accomplished-Sock262@reddit
I have this card. How do I load this up onto it? What coding performance can I expect? Sonnet or way less?
simotune@reddit
48GB is where local inference starts feeling practical instead of aspirational. VRAM headroom changes day-to-day usability more than people expect.
Shapespheric@reddit
Wonder what people think about the 4500 PRO seems like a decent deal too compared to resale prices for 4090 and inflated 5090 at stores
ClickClawAI@reddit
Make sure to use plenty of lotion 😆
KeithHanson@reddit
Not me over here realizing I could write this off on taxes next year 😂😭
Do you (or anyone here) happen to use this kind of setup to replace Codex/Claude work? I am not interested in the cost savings. I want it for doing code things while using uncensored models and consistent behavior.
One thing I think that gets overlooked in the cloud vs local debate is that consistency. Over the past two days I’ve noticed changes in the way Codex 5.3 via OpenCode behaves - often stopping at “I’ll implement this now.” Repeatedly. My coworker with the same setup but on different worktrees noted the exact same behavior driving her mad and I almost jumped out of my chair with a me too!
Anyways, I don’t like it. I want to get the thing to the point that it does what I want in the general way I want and know I have control over that consistency (harness engineering is impossible if the model changes under your feet and you have no control of that!)
Thanks for coming to my TED talk and oh btw any opinions on local coding setup with this compared to frontier models? I could deal with 200k token context no problem.
Valuable-Run2129@reddit (OP)
The closest thing to what you are asking is deepseek v4 Flash. You’ll need two rtx 6000 to run it, but it’ll give you the closest experience to sota models.
panchovix@reddit
I just with the RTX 5000 PRO wasn't so much neutered. They really disabled a lot of cores on that GB202 die. RTX 4500 PRO has the full GB203 die but well slower.
I guess NVIDIA will eventually release something like a RTX 5500 PRO with more cores.
slavik-dev@reddit
Here is my report on RTX 4090D modded to 48GB:
https://huggingface.co/Qwen/Qwen3.6-27B-FP8/discussions/11
Getting about the same speed.
Currently you can buy it for $3500 from C2 site (not sure if it's from China? Hong Kong?)
Freonr2@reddit
Yeah RTX 6000 Ada (4090-ish) actually has faster bf16 compute than the 5000 Pro Blackwell. It's a sidegrade at best with the same VRAM.
teknic111@reddit
Why not just get two 5090s? It's cheaper and gives you more memory.
Valuable-Run2129@reddit (OP)
It’s more expensive. 5090s go for $3500 each. That’s almost $3000 more. Plus the rest of the gear you have to buy that costs more than what I needed for a single gpu
teknic111@reddit
I have two 5090s and I paid $2000 for each.
Valuable-Run2129@reddit (OP)
I just need a time machine then! I’ll grab some bitcoins at 100 dollars when I’m there.
awakened_primate@reddit
Big PP, noice!
Valuable-Run2129@reddit (OP)
It’s all about the PP
Nnyan@reddit
I like the RTX 5000 Pro and it's on my radar but I'm not finding any (at least not once i filter out sketchy sellers). How did you end up cooling it and what's the noise levels?
Valuable-Run2129@reddit (OP)
I bough it from B&H Photo, used with 90 days guarantee. So it was a. very safe buy.
Noise is really good. The CPU fan is louder than the GPU
DeepOrangeSky@reddit
Are you sure the fans were maxxed out? The reason I ask is, in the past I kept seeing people say how because it uses the pro "blower style" fan system (rather than the ordinary consumer-grade fan system that the 30/40/50 series i.e. 3090 or 5090, etc use) that it has a much more annoying, somewhat higher pitched, more "vacuum cleaner" type of sound, vs the regular consumer cards which sound like a barely noticeable deep hum by comparison.
I guess it could depend on the setup though, like what kind of case it is in, how far away, which way it is angled away/towards you for the exhaust side, and if it was already noisy in your house or if you were like alone in a quiet room with it late at night, or so on.
Or could be that people were just exaggerating or making too big of a deal about how much worse its noise supposedly was than the consumer RTX's.
Savantskie1@reddit
Blower cards have always had haters. I’ve never heard them over the 6 140mm fans I have in my pc. And they keep my apartment warm in the winter lol
laul_pogan@reddit
One vLLM gotcha to watch on 27B models: keep
--gpu-memory-utilizationat 0.60 or below. At 0.85 the allocator can wedge the process hard mid-request, requiring a full kill and restart. Counterintuitive because higher looks like more throughput, but the KV cache reservation at inference time can push past what the allocator estimated at startup. Your 200k FP8 weight + bf16 KV combo is already tight on 48GB; anything that spikes over the ceiling during a real long-context request will stall the whole process, not just that request. 0.55-0.60 is the stable range in practice on cards this size.DeepOrangeSky@reddit
Btw, debating whether to make a separate thread to ask about it, but:
Does anyone know if there is a very significant difference in durability, for AI use-cases (using at high continuous intensity, all day long, day after day) of consumer-grade GPUs vs workstation GPUs (i.e. 3090s, 4090s, etc, vs Pro 5000s, Pro 6000s, etc)?
I'd assume the difference, if there is a significant one, would be most stark regarding the 5090 in particular (even if power limited, maybe), since it gets the hottest/most strain out of any of the main GPUs of note, probably.
But, yea like, if you build a big expensive rig of consumer-grade cards like 3090s or something, which were designed with the intention of them being used for gaming, and not for AI inference, let alone AI training or video generation or whatever the most brutal continuous high strain use case would be, vs getting Pro 5000/Pro 6000, is there a major difference in how these hold up over time?
I mean, I guess maybe it could also depend on what type of AI use-cases, like if it is for mainly constant video generation all day, vs if it is for LLMs, vs if it is for training, or so on (i.e. how "continuous" the strain is at max level, vs intermittent bursts)?
If the 5090 is way worse at this than the Pro workstation cards, then it makes the strangely small price difference between the 5090 and the Pro 5000 that people have been discussing on here lately even more bizarre.
Are the Pro 5000/Pro 6000 cards that much worse for gaming, like maybe for day-1 ability to be used on new releases or something (I'm not a gamer, so I don't know anything about how that stuff works), to where there is some fallback safety net for the 5090 that even if AI crashes out, it is way more convenient for gaming than a Pro 5000 or Pro 6000 for some reason (reasons to do with things other than the raw hardware capability I mean. Or maybe even the hardware, if the slightly higher raw speeds for some specs + overclocking matter or something)?
Or is it more difficult to set up, or different more annoying drivers or software support/compatibility or however all that stuff works?
Like are the Pro 5000 and Pro 6000 just blatantly better in basically every way, and there is no good explanation for the 5090's price compared to the Pro 5000, of why everyone keeps buying the 5090 at near Pro 5000 prices, even if less durable, a lot less VRAM, worse power usage, and so on, or is it like, the durability is pretty similar, regardless of use-case, and the 5090 (and 3090s, 4090s, etc) have some kind of convenience advantages of some kind for gaming or what have you, compared to the pro workstation cards, where they can be used in a more easy or convenient way in some way?
leonbollerup@reddit
on tip.. next time.. dont use claude.. use warp.dev instead.. ssh in via warp.dev.. have it do things for you..
CreativelyBankrupt@reddit
Please post some real world benchmarks if you ever capture any!
Long_comment_san@reddit
I'd say 2x5090 are a better deal overal but it's a LOT more tricky to set up (power use, case, motherboard).
It still sucks balls the size of Jupiter that 48 gigs of VRAM is priced so ridiculous you would assume it uses HBM memory. It's wilds its just GDDR7.
Thrumpwart@reddit
First of all - frontier models (even free access plans) are a godsend for linux noobs. I used gemini's free tier for linux configuration and troubleshooting and it really does well.
Second - congrats! That's very good performance! Good to hear it's quiet too!
MundanePercentage674@reddit
at that price how is it compare to 4x amd radeon ai pro r9700 ?
Valuable-Run2129@reddit (OP)
I have no clue! this is my first PC build ever. Hopefully someone can help you with that information here.
MundanePercentage674@reddit
It’ll definitely work for your needs right now i am afraid smarter model more demand VRAM will come out in the future, but personally I’d be willing to spend for 4x amd radeon ai pro r9700 at that price. I’m running an AMD 5950X with 64GB and I just lucked out upgrading to 128GB for $120+ right before AI demand sent RAM prices through the roof a month later.
Valuable-Run2129@reddit (OP)
what numbers are you getting on the same model? both PP and TG?
AustinM731@reddit
I have 4 R9700s, and I can get ~4k pp t/s, and ~100 tg t/s. This is with the FP8 quant and MTP=3.
ComfortablePlenty513@reddit
For 5k you could have gotten a dgx (in your OEM flavor of choice- Dell, Asus, etc) and it has 128GB unified memory and can be clustered via SFP
Valuable-Run2129@reddit (OP)
Prompt processing would have been 80% slower or more. Token generation 70% slower.
There’s no point in running bigger models if the speed is practically unusable.
qfox337@reddit
$4300 after taxes is a good deal, and +1 for noise/power concerns. Also, I imagine it's really nice to just have a bit more RAM and spend less time tweaking stuff, or have some extra for any applications that use it (browsers, Blender, ML research, whatever). And you'll be able to fine-tune some smaller models locally. The 5090 was a good deal at its msrp of $2000 but it doesn't look like nvidia is interested in making a whole lot more at that price.
Long-Chemistry-5525@reddit
I would almost to suggest upping to 70b, as some models have a ctx limit