What in tarnation is going on with the cost of compute
Posted by Party-Special-5177@reddit | LocalLLaMA | View on Reddit | 53 comments
Does anyone know? I can’t even find a server gpu <b200 on vast, and for the first time that I’ve ever seen on mithril, at multiple points last week have h100/h200/b200 all been at over $1k an hour, for sustained periods! I don’t know why you wouldn’t just migrate to runpod at that point, even their pricing isn’t that costly.
Seriously, academics can’t afford that, and I’d assume startups would just buy hardware to lock compute prices in. What in gods green Earth is going on?
FlyingDogCatcher@reddit
supply and demand homie
SnooPaintings8639@reddit
H100? B200? I can't find used RTX 3090 to extend my rig for under 1100 USD where I live. Two years ago I bought two for 700 usd each.
This is nearly 6 years old card, and most of the used ones are very abused. And still, they age like fine wine and only getting more and more expensive.
I have no idea who is even still buying them up. Is it possible that r/localllama is not longer a niche and we did absorb the entire supply? Seriously, soon the used ones will reach the price a new one had at the release date, when it was the most advanced card on the market!
EndlessZone123@reddit
We are far past the point of niche.
In comparison 2.3M Weekly visitors in r/gaming and 3.8M r/pcmasterrace Weekly Active members. 1.9M Weekly Viewers in r/buildapc as well.
redditorialy_retard@reddit
I miss when we were still niche and doing dumb things. The amount of HERE IS HOW I DO X AI post is insane
oodelay@reddit
I WANT TO MAKE AN INFLUENCER AND GET RICH SUPER EASY
redmctrashface@reddit
And here I am, trying to learn in an ocean full of "look at my 16 gpu rig" and "here is my big Harnessy McHarnessface"
0-0x0@reddit
I think recent changes on the $20 anthropic plan(no more opus) and github copilot(from per request, to per token pricing) lead people to flock over to local llms, myself included, I'm impressed by how qwen 3.6 35B fixes the regressions introduced by gpt 5.4(high and xhigh through gh copilot) across my hobby projects, it iterates longer, but it does get things done the way they should be done.
DepressedDrift@reddit
It's all Gooners (me included)
Party-Special-5177@reddit (OP)
It’s definitely rapidly expanding in popularity. I know a guy who went 2x3090 + 4090 -> 3x5090 -> 2x pro 6000 Blackwell. He’s actually the guy who introduced me to localllama in the first place.
….he exclusively uses cloud models now. I pondered that whole situation for a bit, like maybe everyone else is buying GPUs as a hedge, ‘just in case’, but would rather the cloud convenience until cloud shits the bed or becomes intractable. Or (quite likely), people like the idea of running local models more than the reality of doing so.
darktotheknight@reddit
I just bought a 5090 after getting frustrated with the cloud providers. I experienced regular login issues and the usage limits are a joke. Can't get any work done with the usual subscriptions and 2 months of API will buy me an RTX 5090.
My workflow depends so much on LLMs, I can't afford downtime. Hence I'm getting into local LLMs - not to replace cloud, but as a backup/complementary solution.
oodelay@reddit
If you're doing such important work with a consumer card and it breaks, I think it's on you.
starkruzr@reddit
"breaks" how?
oodelay@reddit
Overheat from running 24/7 when it's supposed to be a card to play a few hours a day
Pleasant-Shallot-707@reddit
🤣
starkruzr@reddit
that's not going to happen with proper cooling.
666666thats6sixes@reddit
Even with improper cooling. LLM inference can't saturate a GPU like any modern game can, I can barely hit 200 W on 7900xtx but gaming puts me close to 400 W.
Plus continuous use is better than a few large thermal cycles every day.
Pleasant-Shallot-707@reddit
You wooshed that whole thing
a_beautiful_rhind@reddit
Inflation is crazy too. Combined with everyone wanting compute and now fuel costs. All your money went from 700->1100 proportionally.
SnooPaintings8639@reddit
That's a 57% over two years, 25% yearly compounded increase vs official CPI of 3.3%. I really wish my money at bank account went up that much. Or base salary, that most people use to buy the new hardware with :(
a_beautiful_rhind@reddit
Sounds about right. I was buying ground beef for 1.99 on sale and now it's 4.99 on sale. 3% my ass.
Pleasant-Shallot-707@reddit
3% is an average of all goods in a specific basket of goods and services
hdhddf@reddit
look at a whole pc with a 3090. it's about 700-800 for a 3090 or a whole pc with a 3090 for about 950-1200 for me
MarcusAurelius68@reddit
Yup. Just built one with spare parts and my son’s old 3090ti (I upgraded him to a 5080 last year). Spending a few dollars to upgrade to 128GB and a 5700X but otherwise was all on hand. First time playing with a local LLM.
Dany0@reddit
I'm betting AI labs that were left behind are now scrambling for any compute they can get their hands on. And I don't mean just anthropic
Party-Special-5177@reddit (OP)
My personal fear is it’s all of the ‘autoresearch’ Ralph loop guys that just recently got booted off subscription plans switching to GLM or similar.
One of those guys said elsewhere in this sub that ‘all money made from autoresearch is best spent on further compute for autoresearch’. I figure it couldn’t possibly be that lucrative, but I can’t think of anyone else who would see 4-figure hourly compute prices and shrug.
Fast-Satisfaction482@reddit
How are they even making money?
Party-Special-5177@reddit (OP)
I had no idea, I had to ask Claude.
Apparently, bug bounties. You let a bot iterate over a codebase infinitely until it finds a bug. Small bugs are usually worth a couple hundred bucks, but the big ones (proper exploits with real risk) can be worth thousands. As long as you spent less on compute finding it than you were paid, you make money.
Pleasant-Shallot-707@reddit
The new bitcoin mining: bug mining
Dany0@reddit
At least there's a more clear value to this
Fast-Satisfaction482@reddit
I'm not even mad, that's amazing.
Mickenfox@reddit
It's not that great
Fast-Satisfaction482@reddit
Their idea is having people pay money in order to report bugs. Lol.
coloradical5280@reddit
I can attest to being in that group; however, to be fair, it’s a 2B vision (YOLO) model. But yeah it really does work. Most of the actual compute is still on API with Claude/gpt-5.5 doing the research for what to try next, and then the actual trying consists of training the vision model, for a couple hours a day and spending 10x what it would have cost 6 months ago. Shit even 3 months ago.
challis88ocarina@reddit
It's a perfect storm. I guess project glasswing has a lot of large companies in a bind and patching their software. Think of banking apps, some of which have been rewritten entirely.
I noted also that qwen has just pulled it's free tier mid-month. It was so generous as to arouse suspicions (I stopped after it kept lifting .env files - just so it could iterate smh)
muyuu@reddit
The bubble is making hissing sounds.
florinandrei@reddit
A bajilion people are all trying to do what you do, that's what.
johnnyApplePRNG@reddit
Ever try quitting avocado toast?
Worked for me!
Party-Special-5177@reddit (OP)
…what?
Pleasant-Shallot-707@reddit
They’re making fun of boomerisms
Dany0@reddit
It's a reference to boomers claiming the young generation can't buy a house because they spend too much money on stuff like avocado toast
cutebluedragongirl@reddit
We are entering the dark ages of personal compute. Abandon all hope. There is no escape. The beatings will continue until morale improves. You will own nothing and be happy.
Ikkepop@reddit
It costs that much because that is how much buisnesses are willing to pay. Supply and demand. Compute providers are also buisnesses and have no obligation to regulate prices so academics can afford it. That is the sad truth. Wait till the ai bubble pops, then they will come dosn, but until then, sucks to be us
cohesive_dust@reddit
Sufferin sucatash!
FullOf_Bad_Ideas@reddit
Agentic coding and Hermes/OpenClaw+closed flavors are growing and demand can grow faster than physical hardware.
I don't think non-infra AI startups are buying actual hardware. Even OpenAI doesn't really own GPUs that they're using.
https://gpulist.ai/ is still listing multiple 1024-gpu clusters to rent so I bet there are still thousands of those idle GPUs, but they want to contract them for monthly/quarterly contract and not on-demand flexible pricing. So it's probably artificial scarcity.
boutell@reddit
That explains why I caught Dario Amodei in my living room hooking up little dongles to all of our devices. He just keeps muttering "gotta scale Pro.. gotta scale Pro..."
inigid@reddit
Does Modal still have free tiers? Haven't used it in a few months, but it used to be nice for small ad-hoc jobs.
HomeWinter6905@reddit
I will have 4XH200 available soon; with 1TB of RAM / 128 cores. What sort of long term pricing works for you?
Party-Special-5177@reddit (OP)
Depends on ‘soon’. Right now, I have to stick with 8x, but that won’t be the case as soon as I can ship this tool.
Uptime guarantee doesn’t matter; I literally have none right now anyway, and if I can’t spin up an instance in time I just try again later. Is your rig sxm? I’ve never tried NVL, but literally none of my projects play nice with pcie. What’s you net up/down speeds?
I have no idea on pricing as I really don’t have a good sense for what will be good long term. For example, for an 8x, I used to think 8 was high, then it became a steal when I was paying 14ish, and I am currently paying 31/hr all in right now on Vast. This is far higher than I want to be at, but I’ve been sitting on my thumbs for days waiting for compute pricing to come back down (and it hasn’t so here we are), so as I’ve said, I really don’t know what is good anymore.
ShelZuuz@reddit
Allow "Unverified Machines" in your vast search. Vast takes forever to get around to verify machines, sometimes months. I have high-end commercial hardware on there that hasn't been verified in 3 weeks.
Providers are completely at the mercy of someone from vast getting around to manually connect to the machine to cause it to be verified. And there isn't any criteria extra that verified gives you rather than just 90%+ uptime. It just means a human physically looked at it, but you can ascertain if a machine is flaky in the first 3 minutes of a rental period anyway.
Also generally people with unverified machines will also list it a little cheaper.
Party-Special-5177@reddit (OP)
Turned it on, which yielded exactly one new 8xH200 lol. Thanks for the tip.
I have some stories from the other side of things. 5 days ago, I snatched an 8xH200 instance at a great price (18/hr), had it for an hour, and was kicked off. It now says offline status, and if you mouse over the warning sign, the tooltip says ‘a maintenance is currently underway. It is expected to end at August 30 2037’ lmfao
Just yesterday, I had an 8xh100 expire after just one session. I am honestly hoping it isn’t me doing something wrong.
ShelZuuz@reddit
Are you getting On-Demand or Interruptable instances?
Party-Special-5177@reddit (OP)
On demand
Salty-Policy-4882@reddit
the cost-of-compute conversation usually skips the most actionable layer — most teams don't actually know where their tokens go. flat monthly bill, no per-feature breakdown, no idea which agent loop is the leak.
ccusage (13.6k ⭐, github.com/ryoppippi/ccusage) reads \~/.claude logs locally and prints daily/session/5h-window breakdowns + per-model split. cost the same for everyone but you stop being surprised by the bill — most people find one or two runaway loops they didn't realize were there. pluggable into your statusline so it's visible while you work.
separate but related: routing layer matters more than people think. once you have visibility, sending the cheap calls (rehydrate, summarize, classify) to Haiku/Flash and reserving Sonnet for actual reasoning usually cuts spend 40-60% with zero quality loss on the cheap-call side. wrote up the measurement + routing combo at tokrepo.com/en/workflows/ccusage-real-time-token-cost-tracker-claude-code-170532fa.