GPU Prices. Buy now, or buy later?
Posted by knob-0u812@reddit | LocalLLaMA | View on Reddit | 103 comments
If the Community could sound off on this, I'd be grateful.
Do you think GPU prices are going to stop skyrocketing? Is this FOMO and hype driving the adoption of local inference? I wonder if this mass-market adoption will last for years? Is it a long-term trend? If I wait 6 months, will I regret it? (cause prices are going to keep screaming). I don't know about RAM pricing... is that temporary?
Backstory:
I bought an M3 mbp max in Nov 2023 (128g, 4tb, 16core cpu / 40core gpu).
I use it as a desktop, with 20tb of external memory.
5 different production workflows running about a dozen daily crons. (everything from BERT models to 30b LLMs in prod, with RSLoRA adapters I've trained for specific tasks.)
3 different agent harnesses (2 customs and Hermes). I still hit openrouter (glm-5.1/minimax) for orchestration, and even anthropic for heavy coding tasks.
I'm sitting on the fence about buying a 1x5090 rig, expandable to 3 GPUs, and plug-n-play with a Pro 6000. But $10k is a hard swallow.
This would allow me to run Qwen3.6-35B-A3B-4bit and 27b-4bit in production for sub-agent delegations (4x sub agents concurrent with sufficient KV Cache).
Plan to run this headless as an inference server:
Build: \~$10k
AMD Ryzen 9 9950X 4.3GHz 16 Core 170W
64GB (2x DDR5 32GB)
NVIDIA GeForce RTX 5090 32GB
2TB NVMe PCIe Gen5 M.2 SSD
Fractal Design Define 7 XL case
Super Flower LEADEX Titanium 1700W
Asetek 624S-M2 240mm CPU Cooler
Case Fans Upgrade Kit (PWM Ramping)
===========
Be kind. lol
cniinc@reddit
TLDR: Nobody knows, but IMO no change until 2028 at least.
I think you should read "Where's your Ed At"(www.wheresyoured.at/) and look at the question of datacenters. Mr Citron is a big AI critic who is very strongly feeling that the market should crash, but that everyone who is pro-AI is putting their finger on the scale to keep the gravy train rolling. He is specifically arguing that the datacenters are going to be the epicenter of the crash, because they just can't build them at any appreciable speed, the orders for graphics cards are assuming they'll be there to put them in, and the eye-watering cost of these centers are being funded by money people who are getting queasy.
Personally, I agree with him. I think that's what's going on behind the scenes. BUT, that doesn't answer your question. It all depends on when people stop paying based on the yes-man stories they're hearing. When will that happen? Nobody knows.
I think at some point there's going to be a glut of blackwells or whatever they're planning on putting in those datacenters, because the purchases have already been made, many are sitting in warehouses, but the centers aren't built up. By the time they are, new GPUs are gonna come in that are way more energy efficient or powerful or whatever. NVIDIA just announced a whole new slate of machines that they're taking orders for. From an accounting perspective, the prior GPUs have already depreciated because they're old now. At some point somebody is gonna say "I can't keep fronting cash for all this" and are gonna stop paying, and then things will get sold off. When that is, of course, I have no idea.
When that happens, maybe the buying company will keep the machines and use them as a big tax writeoff. Maybe the other companies will swoop in and buy as much as they can to gain that little edge. B hopefully a bunch will also go to the general market, or to companies like Thunder Compute for people to rent and build remote versions of local AI.
There's one other wrinkle - the big companies have daddy Trump to fund them. I think the big AI companies, and every grifter VC firm or partner company, is selling him a story that China is will win the AI arms race if we don't pour the nation's money into this. Trump, of course, couldn't tell the difference between a RAG and an LLM, and he needs to sell that he's tough on china, so he's gonna believe them. So, while he's in power until january 2029, i think he'll do everything he can to keep the bubble growing and not popping.
So, i mean, I don't think anything will get better in 6 months. I don't even know if it will get better in 2028.
AAAAAAAAAAHHGG@reddit
They’re only going to go up as more people start their own ai-related services. I’m looking to improve my setup as much as possible as quickly as possible
relmny@reddit
10k for a 5090 and 64 RAM (most important parts for local) is insane. also 64gb, if you plan to offload, will be very tight, better 128gb.
But for that money, try to find an rtx 6000 pro or 5000 PRO (48gb VRAM, should couls around half of that budget) and put it in whatever PC you can find.
If you still want a new PC instead, look for pre-built ones. I wanted a 5090 FE and for about 7% more (of what the GPU alone would've cost me), I got a new PC with an 7 9800X3D, 48gb RAM (I didn't care, because I just wanted the 5090), case, etc...
knob-0u812@reddit (OP)
thanks... based on the feedback I got on this thread, I bought a 5000 Pro 74gb, open-box, for $6500 and an eGPU dock. Figure, I'll own the gpu, which is the component I think is most likely to hold value (if not increase in value meaningfully).
geldonyetich@reddit
Ram/storage prices quadrupled over the last year. Some would say that's bound to reverse in time and you should wait. Others (say, the folk who bought out the Steam Deck after the price hike) seem to think it's going to increase yet more. Personally, I am inclined to boycott.
BobbyL2k@reddit
Boycott doesn’t work tho. Because we are such a small market, the ones driving up the RAM prices are the hyper scalers (data centers).
There was a GN video covering how personal computer sales saw a sharp drop. People have already stopped buying gaming computers, and those market are much larger than us local AI crowd.
I’m definitely agreeing for boycotting scalper prices, but this is supply demand. The elevated prices are from manufacturers.
geldonyetich@reddit
True, it's more of a boycott from a, "Screw these prices, I'm out" perspective than a, "This'll get The Man to change their dirty ways" hope.
Honestly, the PC enthusiast market looks to some like it's collapsing, and I can no more vote with my wallet to keep it alive than I can to threaten it with extinction.
Rampocolypse is the latest in a series of blows against the increasingly niche hobby that enthusiast-owned general-purpose PCs has become.
But I don't blame AI. I blame development money going in the direction of power efficiency and mobile form factors because, as you said, we are the smaller market, not just versus big data centers, but also comparing the number of smart phone owners to us. That predates AI, it's been going on for decades. Getting gouged with niche hobby prices in the inertia finally catching up.
It does leave us in an awkward spot in terms of reclaiming some sovereignty by having the necessary hardware to do local AI. If hardware manufactures have no interest in winning us back as a tertiary market, I think we might have to start looking in terms of more creative solutions than just iterating enthusiast PC hardware.
AdOne8437@reddit
the only relief for prices i see at the moment would be a chinese company still in stealth mode with their own fabs and a pirated cuda layer on their cards. and even if such a company would suddenly appear, it would be months.
SSHB1@reddit
Looks solid as a single GPU dev/inference box, but I wouldn’t call it a 3 GPU expandable server/machine. The weak points are platform, not GPU…9950X gives you dual channel memory, 64GB RAM is light for agent/orchestration work and honestly the motherboard choice matters more than the CPU name here. On AM5, three GPUs can become a lane/spacing/cooling compromise very quickly. For one 5090 it’s fine. For a proper expandable inference server, I’d want 128GB minimum, ideally 192–256GB, and I’d choose the board/platform around PCIe layout first. I hope this helps … good luck.
knob-0u812@reddit (OP)
If I want a board that will run 2x Blackwell cards (6000 max-qs, for instance), which board would you choose?
UniversalSpermDonor@reddit
Just be forewarned - Threadripper Pros only work with RDIMM memory, and RDIMM DDR5 memory makes UDIMM look cheap.
Used prices: 32GB sticks of DDR5 RDIMM are ~$550 each, 32GB DDR5 UDIMM are ~$300 each.
That said, one huge benefit of the TR Pro is 8-channel memory - it means that offloading to the CPU can give you good decode for MOE models, although prefill takes a nosedive. My clusterfuck of a build is 2x R9700s, 2 EPYC 7532 CPUs, 16x32 GB of DDR4-3200, and (previously) 4x Radeon V620s. On GLM-4.7 quantized to Q3_XXS, I got 12 tokens/sec of decode using an R9700 for attention and a bit of the model, and 16 tokens/sec of decode using all 6 GPUs. (That's more of an indictment of the fact that R9700s aren't great and V620s aren't good.)
SSHB1@reddit
If I was building around 2 Blackwell cards I wouldn’t start with a random AM5 board tbh… I’d pick the platform first.
3 I’d look at…
ASUS Pro WS WRX90E-SAGE SE… clean workstation route… Threadripper Pro, proper PCIe layout, ECC RDIMM, plenty of slots.
ASRock Rack WRX90 WS EVO… probably the one I’d lean towards for headless/server style use if price and availability are sensible… WRX90, ECC RDIMM, dual 10G, IPMI etc.
Supermicro H13SSL-NT… EPYC route… more proper server than workstation… better RAM ceiling and headless behaviour, but then chassis, risers, airflow and GPU spacing matter even more.
I wouldn’t personally build a serious 2–3 GPU inference box around AM5 unless I accepted from day one that it’s really a single GPU box with future compromises.
Also I’d avoid bro-type seller forecasts and “trust me it works” claims… use official board specs, QVL/support pages, vendor docs and return-friendly outlets so you have comeback if something doesn’t behave. Please do your own research before buying though… multi-GPU builds get expensive quickly if the board/spacing and airflow choice is wrong.
SSHB1@reddit
Also check the exact board revision before buying… not just the model name. With workstation/server boards, revision, BIOS/BMC/IPMI version and QVL support matters. Make sure it’s all upto date, call the vendor and confirm.
knob-0u812@reddit (OP)
really great/constructive feedback man. Thank you
ai_without_borders@reddit
the buy-now question comes down to your monthly cloud bill not GPU price speculation imo. if youre running real production workloads on openrouter/anthropic at $500-1k/month, the $10k break-even is \~18 months, and thats if prices stay flat. the people who regret waiting are usually the ones who kept punting while their API bills kept climbing. the people who regret buying are usually running toy workloads that didnt justify local infra in the first place. given your RSLoRA setup and agent harnesses, youre clearly past the toy threshold.
DeltaSqueezer@reddit
In my local market the RTX Pro 6000 cost $8,300. I ordered one on credit and then chickened out and cancelled it. Now it costs $11,000. A 30%+ price rise in a few months.
My fear is that we are the early ones and so this is only going to get worse. I was hoping that next get GPUs might come out and push prices lower or allow more performance for same dollar, but now I'm wondering whether demand is going to grow way faster than supply and keep prices going upwards.
Nvidia has no real competition in the discrete GPU space for AI and no incentive to reduce prices. Heck, it's hardly worth their time to even create and market such products - from a financial perspective they should just design and produce datacenter products for the next few years.
meca23@reddit
I bought one in March for 8,700. Was planning to buy more if it dropped in price, now almost kicking myself that I didn't buy a second one as we are getting models now that fit nicely into 2 rtx 6000 pros.
SoulStripHer@reddit
I think there will be an improvement in AI that reduces VRAM requirements along with an eventual increase in chip supply.
yamoksauceforthelazy@reddit
The Qwen models are not useful in a production environment whatsoever. Don’t believe anyone who says otherwise. I know that’s a big ask, but I’ve tested the hell out of them over the past 4 months, and I’ve found 3.6 A3b to be on par with 2-4B dense models. A toy. Not useful for much beyond lightweight things like generating file names and such. The idea that it has anything in the same universe as flagship performance is bonkers. I literally just produced an example today while using it in the Qwen Studio app in which it failed to generate a skill file from a documentation website. Like totally fell flat on its face. It never even looked at the link, said it did, then generated a completely nonsense fake skill with completely hallucinated information, and this is not an isolated example. It’s pretty much the average quality bar. I can only assume the vast majority of people claiming to use any Qwen models for production code to be astroturfing bots, because in my experience it’s just not possible. The 3.6-27B dense model with the same settings and instructions did the job well enough, but that is about the upper threshold of its capabilities imo. In the hundreds of real world use situations I’ve thrown at Qwen models (I’ve been a bit obsessed with getting something useful out of them), I’ve not a single time had any (up to API flagship Qwen from a number of providers including Alibaba’s own API) Qwen model produce code that ended up being used. It always ends up being below acceptable standards and ripped out and replaced by Claude or Codex. Just be warned. You will spend $10k and gain no real utility beyond a novelty. The Qwen models are not useful in a production environment whatsoever. Don’t believe anyone who says otherwise. I know that’s a big ask, but I’ve extensively tested them over the past 4 months, and I’ve found 3.6 A3b to be on par with 2-4B dense models. Not useful for much beyond lightweight things like generating file names and such. The idea that it has anything in the same universe as flagship performance is bonkers. I literally just produced an example today while using it in the Qwen Studio app in which it failed to generate a skill file from a documentation website. Like totally fell flat on its face. It never even looked at the link, said it did, then generated a completely nonsensical fake skill with completely hallucinated information, and this is not an isolated example. It’s pretty much the average quality bar. I can only assume the vast majority of people claiming to use any Qwen models for production code are astroturfing bots, because in my experience it’s just not possible. The 27B dense 3.6 with the same settings and instructions did the job well enough, but in the hundreds of real-world use situations I’ve thrown at Qwen models (I’ve been a bit obsessed with getting something useful out of them), I’ve not a single time had any (up to API flagship Qwen from a number of providers including Alibaba’s own API) Qwen model produce code that ended up being used. It always ends up being below acceptable standards and ripped out and replaced by Claude or Codex. Just be warned. You will spend $10k and gain no real utility beyond a novelty.
DutchDevil@reddit
As someone who is thinking about investing a lot (for me) of money in a local setup this scares me. Not expecting frontier model quality but it will need to be very good. I am thinking about a 128gb 395+ as speed is less important than quality and my workflow would not be interactive
Honest-Kangaroo-1830@reddit
I recommend you watch this video, it should give you some insight if the capability meets your needs
https://youtu.be/JyS8A-5LIY8
ProfessionalSpend589@reddit
A lone 395+ with 128GB is good if you want to run MoE with full context, but you’ll be left with quite of bit of unused RAM.
If you have a computer already you can buy 2x 32GB GPUs to run Qwen 3.6 27B at reasonable speeds with full context. For the same or more Ben less price than a fully configured Strix Halo, but it’ll be about 3 times faster.
Honest-Kangaroo-1830@reddit
I'm not with you when it comes to 27B, but I am with you that people need to evaluate it with their workloads for weeks before considering hardware purchasing.
I think if you shift expectations to match models that came out last year, you'd be about right. Current frontier models can take vague prompts and turn it in to usable code. The best local models are best served with smaller scopes and typically execute well against these. For the tasks where I need to do something big and I'm not sure where it will take me, I use Claude Opus via API and have it route tasking to my local models. This works great, and I don't have to do it often.
Competitive_Bad4537@reddit
I was debating this as well, and after I wasted some money on Vast.Ai, I pulled the trigger on a DGX Spark, which was sufficient for my workflow. With Apple lowering studios to 96 gigs of RAM, it didn't give me hope that costs would go down. This was just my educated guess after debating this for the last two months.
Kahvana@reddit
Okay, first:
Personally, I think it doesn't due to the current volatile state of the world regardless of AI hype.
Opposite, you only go local inference once you hit the limitations of cloud-based inference (this can be ownership, privacy, preference for paying upfront cost over buy-as-you-go / subscription, custom models, etc).
No one knows, we can only guess! I think it will but in ways that are largely invisible for the end-user.
It's already enhancing how we use our devices and talked about in the general public, so I guess so!
I know in my case I would've as I saw what the price trend was doing in september 2025.
Simply put, no one knows. Likely for how high it currently is, yeah. It can take a long time to come down, but that's uncertain (the future).
Alright, now the rest of the post:
I'm currently running 32GB (dual RTX 5060 Ti 16GB), which is quite workable but feels limited in context / quality I can run. Are you comfortable buying used? If so, look at dual RTX 3090 24GB. Otherwise look to at least acquire 48GB, even if it's a tad slower than the RTX 5090.
For your CPU (AMD Ryzen 9 9950X) is there a reason you need this outside of AI? For gaming, you might be better served with AMD Ryzen 7 9800XD. The AMD Ryzen 5 9600(X) has all the PCIE lanes you need for your setup.
I don't see your motherboard listed. Which one are you considering? Does it support bifurcation?
BitGreen1270@reddit
I just bought a very similar spec 2 weeks ago. Only difference is to cut costs, I went with a 9700x (CPU not so crazy since GPU should be handling most of it), 2TB gen 4 ssd and 1200W psu. Also got the cheapest case I could get, avoided water cooling ( got the PA 120 SE).
My build cost me 6.3k USD of which the MSI Ventus 3x 5090 cost 4k USD.
My view (speculation) is that prices won't drop for the next 2 years. And even if it drops after 6 months, I don't want to wait that long for learning more about LLMs. That's right, this is a purely learning rig.
NewToReddit4331@reddit
Man paying 4k for a 2k gpu for a learning rig seems a bit ridiculous doesn’t it?
BitGreen1270@reddit
It was never 2k where I live, at best 3k. But yea, it's a deeply personal decision. I can't defend the choice according to your criteria.
NewToReddit4331@reddit
Just can’t imagine it myself, wishing it works out for you but experimenting with LLMs is no where near important enough to spend that type of money in my eyes
These prices are absurd, I’ll be hanging onto my 4080 build until it explodes at this point
BitGreen1270@reddit
Dude you have a 4080. My current GPU is a 780m and the one before that is a 1070 in laptop from 2018 lol
NewToReddit4331@reddit
No I get it, I just can’t imagine dropping that type of money for the sole purpose of experimenting with LLMs
Wishing you the best on your upgrade, it just hurts my soul to see these prices (as someone who passed on upgrading to a 5090 at MSRP, and upgrading to 64GB of DDR5 because “I can get it later”)
I planned on grabbing a 6090 as my next upgrade but if prices don’t settle, I’ll never upgrade lol
knob-0u812@reddit (OP)
that was how I approached my m3 in 2023. I had no idea that hardware prices were going to explode. I looked at like, "I need to learn by doing" and went for it. I certainly don't regret that decision.
Thanks for sharing your build. Has me thinking
BitGreen1270@reddit
No worries, good luck with your decision. FYI - I also downgraded from x870 to Aorus B850 wifi 7. Only because I have no intention to run multiple GPUs and the x870 can cannibalize pcie lanes if you put a second nvme or something.
thesuperbob@reddit
This situation is why I'm halfassing it with a janky pile of Mi50. Admittedly I impulse bought more than was reasonable, after getting a decent deal on Alibaba, but they definitely are paying off right now.
So with 4 x Mi50 32GB you can run MiniMax 2.7 with a useful bit of context. I tried running it with a lot of context on 6xMi50 32GB, and the speed dropped to unusable levels by the time it accumulated ~64k context, and at 96k context was just a waste of electricity, 5tok/s wasted on repeated thinking and model becoming incoherent.
So I can recommend trying 4 x Mi50 32GB and MiniMax 2.7 at q4, compared to Qwen 3.6 it's really doing a good job for me, until it becomes hopelessly bogged down with context.
As for setup difficulty: on Linux you can just use a gfx906 docker container to run llama.cpp or vllm. Initially I tryied to roll my own ROCm build from scratch and it wasn't that bad either, but sure took a while to get going, and then more time and effort keep updated.
The excess cards I plan to use for when I finally get the orchestrator setup working, so even if I upgrade in 2027, those will still be capable of performing secondary tasks, even if newer hardware takes over running the "big" models.
Or I can probably just sell them for a while still, and at best profit, at worst not lose much over it.
WishfulAgenda@reddit
First a simple question. Are you using this to make money, science or as hobby?
If you’re using this to make money or for science what does your cost analysis say? Will either option result in a meaningful increase of value generation for the task at hand.
If it’s for a hobby it’s your decision on how much you spend of your money.
In both cases, given the stated goal of having a local inference server, I would argue that your looking at the wrong comparison and the rtx5090 is an all around bad choice. Instead I would be looking at a 6000 WS vs max Q vs 5000 vs 2 x 4500 etc.
Note: recently went through this and have a max Q on order. My rationale is it’s fast enough, can hold the models I want for now, I can expand to more of them. most importantly for my use, the likelihood of generating enough additional revenue to warrant spending that amount is very clear and already proven with a dual 5070ti server I’ve been using.
Good luck in the decision.
ea_man@reddit
The common folk is being booted out of cheap subscriptions plan, see Copilot in these days, I guess there's gonna be a lot of them willing to try to grab some local compute hw.
rdkilla@reddit
5090 rental pricing/hour is skyrocketing over the last month. if this continues we are in a new crazy like the peak of crypto gpu mining. with what qwen3.6 can do already with 32gb, in a year or so its only going to be more capable
TurnOffAutoCorrect@reddit
An owner of one gpu rental company provided their own view on the current situation of price increases yesterday in /r/StableDiffusion
https://www.reddit.com/r/StableDiffusion/comments/1tsvt9z/renting_a_gpu_for_use_with_a_service_like_a/op0k2mt/
durden111111@reddit
10k is an absolute robbery for those specs.
knob-0u812@reddit (OP)
If you have a time machine, I'll gladly use it...
ProfessionalSpend589@reddit
Well, with current RAM prices those have skyrocketed too and now they’re unobtainable.
reto-wyss@reddit
That build is USD 10k?
knob-0u812@reddit (OP)
sagiroth@reddit
I wouldn't do it. Even if you rent for few years will work out cheaper but again it's a gamble
awitod@reddit
Do you think supply will increase faster than demand in the near term?
I don’t see any reason to think so.
mohelgamal@reddit
On thing that can cause the supply to outpaces the demand is the power restriction.
All those orders for GPUs, put together, require a massive increase in power production something like 200 gigawatts
To get a gigawatt of around the clock persistent power output, you need around $1 billion dollar minimum, and the fastest way to deploy that is solar panels and batteries, something like a 1000 acres in an ideal location which would take about a year.
Gas turbines right now are setting at 4 year wait, nuclear reactors are 5-10 years for the new approved projects and even those are like 5 gigawatt total.
So that’s a big and very expensive problem, in addition to buying the servers, and if the user subscriptions don’t ban out, there may be a glut of AI hardware that can’t be put anywhere
Tagedieb@reddit
I don't see this happening either. They have been working on speculation that the value of AI and the value of the hardware needed will skyrocket. So even if they can't use the chips immediately, they will want to buy them. But also the power needed to run AI is taken from the open market. The hyperscalars can just outbid everyone else and then sell tokens at an even higher price.
The only way I see this ending is when there is a reckoning on the market that buyers of tokens (companies) are actually not willing to buy at the price needed to keep this system running. But since there is so much investment in this space, this will probably be a few years at least.
awitod@reddit
Perhaps but even were that true, I don’t see how a glut of data center hardware translates to consumers in the near to medium term.
I think memory may get cheaper because it is easier to make and new suppliers are already entering the scene
fallingdowndizzyvr@reddit
Well.... there's the current Best Buy 5060ti example. It went on clearance for $420, down from $599. I guess even at $420 it didn't sell very so this week they reduced it to $300. Today, it finally sold out. Cutting it 50% from the recent pricing and 30% under MSRP to stimulate sales seems to imply a slack in demand. Even at MSRP it would have sold out in minutes months ago.
slavetothesound@reddit
I saw one at $320 and didn’t buy. Couldn’t justify buying an overpriced ram and other components to go with it since I’m currently Mac only. It was a sad moment but seeing that the shelves were full of GPUs gives me hope for the future.
_Asphadel@reddit
$300 is for the 16GB version?
fallingdowndizzyvr@reddit
Yeah. Out of stock for ordering as of earlier today. But it's worth a minute to check if there's a store around you with it in stock.
suesing@reddit
Whether or not agents can produce meaningful value remains to be seen and to what degree.
Limitless token allowance does not equal limitless value created. At what point do business realize the balance?
Openclaw was released q1 2026 and spawned the agentic craze. Everyone and their mother has built it out. So agents can consume the amount of tokens that people never could. Theoretically, demand will never be satisfied. If only tokens were free.
Let’s see if there will be a rebound
CodePalAI@reddit
if you need it for real work, buy enough to stop thinking about it. if its just experimenting, wait. i learned this with CodePal AI stuff too, hardware anxiety eats more time than people admit. either it pays for itself or its a toy budget, middle zone is pain.
KFSys@reddit
The GPU price curve is brutal and I don't see it reversing in the short term. But before dropping $10k on a rig I'd run your actual production workloads on rented GPU compute for a month to figure out what you genuinely need throughput-wise. DigitalOcean has on-demand GPU VPS instances (H100s, A100s) with no commitment. Either you max it out consistently and the hardware buy makes obvious sense, or you find rented capacity covers you fine and keep the cash. The Pro 6000 plus multi-GPU expansion path in particular seems expensive to miscalibrate on.
KFSys@reddit
The GPU price curve is brutal and I don't see it reversing short-term. But before dropping $10k on a rig I'd run your actual production workloads on rented GPU compute for a month to figure out what you genuinely need throughput-wise. DigitalOcean has on-demand GPU VPS instances (H100s, A100s) with no commitment. Either you max it out consistently, and the hardware buy makes obvious sense, or you find rented capacity covers you fine and keep the cash. The Pro 6000 plus multi-GPU expansion path in particular seems expensive to miscalibrate on.
spammmmmmmmy@reddit
When would you earn the money back?
graypasser@reddit
Honestly I'd say just wait, GPU economy is honestly absurd and it's rather likely to fall off cliff after some point.
entsnack@reddit
I remember when we were saying this during COVID. I'm glad I got my 4090 for $1800 then.
graypasser@reddit
Oh yes, but it's obviously not the same situation or same reasoning is behind those opinions, I'm not sure if I would've said "just wait" in that time.
Alan_Silva_TI@reddit
Buy now if you REALLY needs it to do real work/experimentation/research.
Buy later (at risk of prices being higher) if you just want it because of FOMO.
BlackBeardAI@reddit
I got (almost) the same pc as my 3rd node and never looked back. Just be sure to max out your ddr5 to 256gb.
comp21@reddit
Don't know about hardware prices but I know right now the amd platforms are pretty cheap. The way I figure it they'll eventually be as supported and efficient as nvidia but since they're not now I'm not paying that premium.
I have one a9700 AI pro card with 32gb vram now and it screams. My second one should be here in a few days. I also have the Corsair 300 AI server with 128gb vram and I run four instances of a 27b model to run my genetic reports quickly.
darktotheknight@reddit
It will eventually come down. Not in 6 months and maybe not in a year, but they will come down. The current crisis is mainly due to memory shortage. SK hynix, Samsung and Micron are ramping up production, China's CXMT and YMTC are basically "still entering" the market, hopefully disrupting the market in 2027.
Of course, how all of that affects the market - especially the US market with all its tariffs and anti-China policy - remains to be seen. But I'm positive, we will see price drops. Cause end of the day, it's just chips and it's not alien technology.
Sofakingwetoddead@reddit
I went through the same thing recently. From my POV - in the past two weeks I did not need to spend a dime on cloud compute and the speed at which I'm completing work has doubled compared to cloud models.
I thought going local would be a compromise - slightly slower speed while only offloading 70% or so of the workload local. I thought I would still be partially dependent on cloud models.
Turns out, it was not a compromise at all. 100% workload shifted local and I'm completing work in \~half the time.
Yes, hardware is expensive. Yes, it may come down in price. However, if I had spend the past two weeks using cloud models I would have burned through at minimum of 1.5k and I would be behind where I am. If we did a calculation to determine what it would have cost in cloud compute to implement what we did in the past two weeks - it would probably look more like 50% of what we implemented OR 100% but at a cost 3k over 3 to 4 weeks.
So, that settles it, then. What are you spending on cloud compute, currently. Can local replace your cloud compute? If the answer is you're spending a lot of money for something you can replace with local compute then the answer is simple - your hardware will pay for itself in a short period of time.
Should you wait? If you're in the same position as we were - definitely not.
One thing on your comment about running the two Qwen models - You may not be able to run both at the same time on 96gb.
If you want to take full advantage of your RTX 6000, you'd probably want to be running SGLang on Linux. Your SGLang package is going to be \~87gb for Qwen 27b FP8 KV16. You won't be able to run both models side by side. HOWEVER, you will have blazing speeds with the single model. I mean ingestion at a rate of greater than 15k tps.
That's the trade off. With SGLang properly set up, you can take advantage MIG, and that's truly where the massive speed increase comes for us. When we start a fresh prompt and there is a ton of reading required, parallel works do the reading simultaneously. I would imagine that if you were using an orchestrator, then you'd be able to run other tasks in parallel, as well. Something to keep in mind because you cannot do this with the 5090 setup.
What is your current rig? If you have a last gen PC you may be able to drop the single rtx 6000 into it and not need to worry about future bifurcation or build a new rig. The new rig isn't really going to help you if your VRAM target is 96gb or less. What you need is CUDA and VRAM, not ddr5 or higher CPU clock speeds.
And when you say concurrent - you're not really gonna be able to run them concurrent with the 5090. They will be queued in series, not actual parallel processing. The 5090 literally cannot physically do it.
CreamPitiful4295@reddit
I was told not to expect prices to come down for 2 years due to the datacenters.
CreamPitiful4295@reddit
I just got a 5090/128RAM/4TB NVMe for 8K. That was hard to swallow. Running qwen3.6 27B Q4. I love it. I think you want at least 128Ram.
AlwaysLateToThaParty@reddit
The RAM that I bought for my 10th generation intel setup, seven years ago, is more expensive today than it was then. I can't see a drop off in prices happening that quickly.
CoolConfusion434@reddit
It's a gamble but a recent and momentary softening in V/RAM prices came from 1) Google announcing their TurboQuant (reduces need for memory) and, 2) the Chinese plan to soon flood the market with their own chips. Greedy current market OEMs quaked in their boots a little and adjusted prices.
IIRC, the ChinaChip floods are starting end of 26, and will be in full swing starting 2027. It is not typical of Chinese manufacturers to tier, obfuscate, or limit their products. If there's a lot of demand for their product, they simply make more. This contrasts with the withholding and speculating done by current crop of top OEM manipulators.
Looking forward to see that shady circle jerk broken of Chip Company 1 investing in AI Company 2, who pledges to buy services from AI Services Company 3, who future orders chips from Chip Company 1. No one ever transfers any money, or complete any of these transactions. But they make billions in gains on publicly traded stock value for the potential future possibility that they might.
punky-beansnrice@reddit
you already have a working prod stack on the M3. the 5090 rig isn't going to make your crons better, it's going to give you a new toy to fiddle with for 3 months. wait until something in prod actually hurts
Enough_Big4191@reddit
if u already have real production workloads, this feels less like fomo and more like infra planning. the people i see regretting gpu buys are usually the ones without an actual sustained use case. that said, i’d probably sanity check whether your bottleneck is truly compute or just agent orchestration + model reliability. a lot of teams buy more gpu before measuring where the system is actually slowing down.
CertainlyBright@reddit
6000 pro just got their second price hike last week. Buy now or get left behind
luvs_spaniels@reddit
Honestly, do a cost benefit analysis at the current price and also look at how current prices increasing/decreasing would impact things for you.
Since you're already using a mix of local and cloud, you've got the raw data. You know your workflow, token usage, etc. Run the numbers, look at it on different time frames, and weigh the pros and cons. Include all current compute costs, including LLM subscriptions, API costs, runpod, colab, etc. Identify what tasks the new GPU setup would handle, their current cost, projected costs, and calculate the payback period to see if there is one. Electricity costs should be in the analysis, too. What happens if part of your workflow uses more/less tokens? Play what if with a spreadsheet.
For me, dual 5060TIs ended up being the best fit. My workflow uses a combination of cloud and local. Moe models work fine for most of it, and Q4 to Q5 is adequate for me. 32gb local VRAM eliminates my cloud compute bills with room to grow. I do have 128gb (4x32gb) ram, which lets me use Qwen 3.5 122B easily if I need a larger local. (Btw, 4 sticks ddr5 is an absolute pain to tune and will never be as snappy as a 2 stick. I'm happy with 5200mhz, which is as good as it'll get with my hardware.)
To answer your original question, the recent contract prices for DRAM suggest prices will go up. It's being driven by perceived institutional demand. For example, OpenAI never had the cash to buy 40% of the world's DRAM output but they signed letters of intent. As crazy as the deals seemed, the market has to treat those like they're real. Available supply goes down, and the price goes up.
But the SpaceX IPO screams bubble. (It reads like Dune and a finance textbook had a really strange kid. However, the staged lockup period and the Nasdaq fast tracking deal suggest that Musk thinks the bubble will pop quickly. This is a rapid exit strategy for early investors and insiders disguised as an IPO.) If it pops, component prices will fall because the companies who signed the letters of intent will be bankrupt. But bubbles are difficult to predict. Greenspan was warning about the irrational exuberance (the dotcom bubble) 4 years before it popped. You can't bet on it popping this summer.
It's a question of risk and costs today and need.
perfopt@reddit
If you used API instead how many days before you spend $10k?
WSTangoDelta@reddit
I did something similar, but at a fraction of the cost. If I told you you'd probably not like it.
theveganite@reddit
I use a 5090 running llama.cpp, nvfp4 model with f16 KV cache and MTP with Hermes Agent and it's fantastic. It's completely usable and handles complex tasks just fine as long as I am prompting intelligently and managing the workspace properly.
Quadrapoole@reddit
Better to just get garbage everything else and just get rtx 6000 pro.
CPU and system ram does not matter for llm
AiGenom@reddit
I think with some new gpu huawei , intel and neuro accelerator prices for rtx are down...
PigSlam@reddit
With RAM prices being what they are, I think other part prices are constrained because people aren’t buying a GPU for a CP they can’t get RAM for. If you have the RAM already it’s probably a good time to get a GPU. I bought an R9700 last week.
2Norn@reddit
buy now is always the answer
who cares if the prices do not rise further at worst case u got to use it and then still have a product with resell value
if it drops call it cost of operation for the time you used it instead of WAITING
and if the prices rise you won anyway
ttkciar@reddit
I think prices will have gotten lower by 2028, if not sooner.
Between now and then, I don't know. This is an anomalous time.
If you can wait until 2028, then wait. If you can't, then buy.
I'm opting to wait, but only because I already have some decent GPUs obtained pre-RAMageddon. They should suffice until 2028, I hope.
NoFaithlessness951@reddit
What's your case for prices getting lower by 2028?
To me it seems like companies will keep buying all the hardware they can get their hands on and leave very little for consumers.
HayatoKongo@reddit
They might start dumping some of their current hardware onto the second-hand market, right?
a_beautiful_rhind@reddit
Man.. my server cost like 60% of that, even flubbing around with P40s and another mobo over time. You will have 32gb of vram and I have 96.
fallingdowndizzyvr@reddit
Why not 2x5070tis for $700 each? So $1400 for the pair. While no 5090, if TP gets working well it'll get you in the same ballpark.
Riseing@reddit
Buy the 9700 ai cards if you have to buy something now. I did 3090s but only because I already had one. 3090s on eBay are like 1100 now, new 9700 cards are like 1400
MachineZer0@reddit
I believe MSRP of 5090 FE will officially go to $3500. They will still be unobtainium. And then the secondhand prices will go to 5-6k. DDR5 and HBM is what to watch for.
If you are thinking of going local, go all in now. Relief will be in 3 years.
fmlitscometothis@reddit
Buy now. This shit's just getting good and nothing is getting cheaper. Plus you sound like you'd get use out of it.
Gotta spend money to save money!
ieatdownvotes4food@reddit
now. agents are gonna run amok
see_spot_ruminate@reddit
Don't do the 5090, if you are not gaming of doing image gen it is not the best value, especially right now.
The rtx pro 6000 is good if you got the money, but if your goal is just to run the qwen 3.6 models, then maybe overkill.
As always, people leave out the 5060ti. While not the most conventional... I am running the fp8 qwen3.6 27b at around 40 t/s avg (up to 70 sometimes with mpt and coding tasks) and like enough context for 2 users (~500k with vllm). This is on a quad 5060ti and will need some thoughts into what you do for bifurcation, but its not that hard. Also, it idles at like 70watts for all the cards and the system.
Ariquitaun@reddit
Unless you really want to run llms locally because you like it or have specific privacy concerns there's no universe in which I'd pay thousands just to run qwen3.6 35b. Right now the economics don't make sense,and for me it's the only consideration at this point.
skywalker326@reddit
FYI, DDR7 ram will be in shortage at least by 2028Q2
PrettyMuchAVegetable@reddit
For whatever reason, Amazon listed 1 single PNY RTX 5080 OC at Canadian MRSP two weeks ago (May 2026). I saw it at 1449$ CAD , 1 in stock, sold by amazon.ca and I pulled the trigger. Everything is pointing towards years of climbing prices.
EbbNorth7735@reddit
We don't know. Various possibilities could occur. AI models are also providing intelligence now. That has value that may not diminish ever. The dollar could continue to decrease in value. War could break out cutting off the supply of AI chips. AI capital investment may realize the return expectations were simply too much capital for business to adopt. Without customers the boom could stop and prices may snap back to reality as big tech ceases AI roll out. Their current investments timescale to pay off doubles but companies utilize them to further their ambitions for the foreseeable future. Nvidia and ram purchasing decreases causing a price correct. Perhaps the market just gets flooded with 512GB 700GB/s bandwidth Chinese GPU's designed for global AI inference workloads for $5k immediately killing the cost premiums of American tech. I'd buy it. That 700GB/s soon becomes 2000 and now people have private home AI data centers.
Thepandashirt@reddit
Nobody really knows what happens with hardware prices over the next year. Anyone claiming they do is full of shit.
With that said, I think 32GB of VRAM will be limiting for agentic setups. You want qwen in a higher quant than q4 based on my experience and testing, so budget for a higher quant. I think 48GB of VRAM should be your target which probably means a blackwell gpu or dual 3090s. Maybe swap from AM5 to AM4, so you can save on ram. Your CPU choice and ram has little impact on inference.
But I recommend you try out your setup on some rented hardware to get an idea of actual needs first before blowing 10k. 48GB is what i personally settled on for qwen3.6 27B but your needs might be different. Also plan on using a larger quant. Qwen3.6 is a great model, but its tool calling is garbage in Q4.
knob-0u812@reddit (OP)
thank you
EmPips@reddit
assume tech will always tank in value. Buy when you need the product.
Miriel_z@reddit
There were some positive news for RAM manufactured by China around 2027. Chinese GPU could later follow this trend, and it will take years. So for about 2 years or so we are stuck. And others said it correct: none of us can predict the future, we can only assume based on limited information we have.
Winter-Editor-9230@reddit
https://www.tomshardware.com/pc-components/ram/hbm-is-eating-your-ram
https://cleanview.co/data-centers/us?hl=en-US
Normal-Ad-7114@reddit
Just rent a server, try different configs, see for yourself what you're getting for the money
knob-0u812@reddit (OP)
I use runpod and have experimented. solid call
Bulky-Priority6824@reddit
Even if they did drop prices for the big dog stuff the big dogs will just buy it out. Get what you can when you can.
dryadofelysium@reddit
We can't see the future and there are various ways this could go, but I think one thing we can pretty safely say is that this year (so within the next 6 months) there is going to be no improvements whatsoever. It'll get worse before it gets better, and we do not know yet if 2027 will be better.