Are local models becoming “good enough” faster than expected?
Posted by qubridInc@reddit | LocalLLaMA | View on Reddit | 103 comments
One thing we’ve been noticing lately is that a surprisingly large percentage of day-to-day AI workflows no longer seem to require frontier-scale cloud models 24/7.
For a lot of practical tasks:
- code explanation
- structured edits
- summarization
- retrieval-heavy workflows
- boilerplate generation
- lightweight agents
…smaller/local models are getting close enough that the economics start looking very different.
The interesting part isn’t necessarily “local beats cloud.”
It’s that more people seem to be moving toward workload-aware setups:
- local models for fast/repetitive tasks
- cloud reasoning only when needed
- dynamic routing between models
- optimizing for latency + cost, not just benchmark scores
Feels like the conversation is shifting from:
“Which single model is best?”
to:
“What’s the smartest architecture for the workload?”
Curious how others here are thinking about this.
Are local models already good enough for most of your daily workflows, or are frontier cloud models still doing the heavy lifting?
JLeonsarmiento@reddit
Models yes, accessible local hardware not.
qubridInc@reddit (OP)
That’s fair. The software side moved insanely fast, but consumer hardware economics still feel awkward once you move beyond smaller quantized models.
Feels like there’s a missing middle layer right now between:
A lot of people can technically run strong local models now, but not necessarily in a way that’s efficient, scalable, or accessible for normal users/businesses yet.
That’s why the next few years on the hardware side are probably just as important as model progress itself.
Tiny_Recording6633@reddit
Local models, hardware, apps, and data explode over next 5 years. Don’t think Mac & Co in Cupertino don’t see this clearly. They are likely all over it. We will see it soon from them and many others. Stakes are too high to ignore.
suprjami@reddit
The proprietary orgs were predicting this as far back as 2023, only 6 months after the release of ChatGPT:
https://newsletter.semianalysis.com/p/google-we-have-no-moat-and-neither
I'd actually say open-weight models are going slower than expected.
qubridInc@reddit (OP)
That memo aged surprisingly well in hindsight. The core idea that “distribution + iteration speed” could matter more than pure model secrecy seems much more obvious now than it did in 2023.
What’s interesting is that even if open models may be progressing slower than some expected on absolute capability, the hardware + inference side improved massively at the same time. Quantization, routing, longer context handling, better serving stacks, cheaper VRAM access - all of that compounds.
So even if the raw intelligence gap still exists at the frontier, the practical usability gap for many workloads shrank much faster than people anticipated.
SketchyMitch@reddit
Thanks for sharing this. It’s cool to look back and see how people were thinking about this 3 years ago.
StyMaar@reddit
Local LLMs and open-weight models aren't the same thing.
If you include the big ones (Deepseek, GLM, Kimi), open weight models have pretty convincingly shown that the proprietary AI makers had no moat indeed.
But they still have a big moat compared to local llms: you can't have x8 GB200 at home to run SotA models.
Deep90@reddit
I think the problem is more-so that purpose built hardware isn't really hitting the market, and datacenters are incentivized to run the most state of the art models, not the 'good enough' ones.
darktotheknight@reddit
I think the problem regarding open weight is training these models is a multi-million dollar adventure with no guaranteed outcome. You want to make sure you're getting your investment back. The large open weight models like DeepSeek, Kimi K2.6 and GLM-5.1 are merely a PR stunt to get your attention, as most people don't have systems to run these models in a reasonable speed anyway. If Z.ai or Moonshot AI were the market leaders and not Anthropic/OpenAI, you can bet your money they wouldn't release their weights.
I think overall, the biggest losers in this are the academic and public research community. They don't have the funds to compete with the industry and they have to use closed source models for their research, if they want to stay relevant.
DeepOrangeSky@reddit
Also, in regards to this sub-topic, that reminds me of another thing I've been wondering about. Some of these top American universities have huge endowments, and lots of serious talent, so, when we see colleges like Harvard (50+ billion annual endowment), Stanford (20+ billion annual endowment), MIT (20+ billion annual endowment), that's such huge amounts of money, with ~80% of it being pre directed to specific uses by the people donating the money, and 20% being general-purpose, that one way or another by a few hundred million of the pre-directed stuff being pointed at training SOTA local models or a few hundred million of the general-purpose money (either way in the many billions in both categories, each year) could use a relative small (borderline "chump change") amount of their money to do full, frontier-level training runs (could even do it on their own hardware if they wanted, given how much money they get each year, let alone if they just do it via Trainium or whatever).
I wonder if we might get some actual strong open weights models from Universities like MIT or Harvard or Stanford. With them making them for their own use, and then also sharing it openly, too, since they are a University, so everyone else gets to use the models as well, as a nice side effect.
Maybe with how strong AI is getting, they'd get nervous about doing this, since maybe they'd be worried that the rest of the endowment would stop donating if they were angry about the ~1% of it getting used to build AI that will take the jobs of their sons and daughters when their sons and daughters graduate, and so they wouldn't want to risk the other 99% of their endowment by using 1% of it to make super strong local AI models?
But in terms of just raw money and capability to do it, I think they'd be able to do it. And not just one university either, but several of the top few (especially the ones I specifically named).
DeepOrangeSky@reddit
Although I tend to agree with you (to me it seems like only Nvidia (and I guess AMD, Intel, Samsung, Micron, SK Hynix, etc, the hardware companies basically) have much real incentive to release open weight models to the public), I am curious what your thoughts are on why Google released the Gemma models (inlcuding Gemma4 just recently, with the 31b dense models being strong enough to cut into things fairly significantly, like, it wasn't just some useless toy, it can do "real things" pretty significantly and cannibalize stuff a fair bit), and OpenAI releasing OSS 120b a few months back.
You were saying that if zAI or Moonshot were in the shoes of American frontier labs like Google or OpenAI they wouldn't be open weighting models, but, we've seen Google and OpenAI openweighting some fairly significant models. Not 1T models, to be fair, but still, these weren't just little 2b or 4b toys. They were strong enough relative to frontier models at the time of their release that it would potentially fit to at least some degree into your argument of them not having a good reason to do so, right?
Personally I am a bit confused by it and not sure why they do it, tbh. I am sure they have some reasons, but, yea it does seem a little weird, since they definitely are not doing it to be nice/for charity or anything like that. They are for-profit, and have some actual hard reason for why they are doing these things, but seems like it must be some convoluted or not so obvious reasons I guess (for the American frontiers that are doing it I mean. The Chinese side might have significantly more obvious reasons by comparison).
I saw an interesting youtube vid recently that was hypothesizing about this exact topic, in regards to Google and why they would release Gemma/Gemma4, btw. Curious what you think about that guy's theory about the reasoning behind it.
phein4242@reddit
The biggest issue with anthropic/openai, is that they need to produce ROI. This means that, unless they find a magic goose, limits will be raised, and subscription prices will follow accordingly. And up until now, there isnt much to show for all those billions invested. That is, if you disregard palantir and lavender ofc…
That does not apply to open weight models. ;-)
darktotheknight@reddit
Agreed.
Disagreed. Why do you think GLM-5.1 weights were released? Or Kimi K2.6? For charity? Truth is, running and training GLM is not less expensive than Claude or GPT. Z.ai's or Kimi's subscriptions are the same price range as Claude's or OpenAI's - if not even more expensive. They release the weights to increase their market share; so people know their name, check out their website, maybe buy a subscription if they like they model's performance.
Like Qwen, WAN is also Alibaba and look what they did over there. They went from open to closed weights. Once they reach a certain threshold/market share, releasing the weights is no more a priority. Or Stability AI with SD/SDXL?
Going forward, we shouldn't take open weights as granted, because end of the day, the models are released by profit-oriented companies, not charities.
suprjami@reddit
Part of getting better is getting more efficient.
Compare Mistral 7B and Qwen 3.6 9B. These are very different despite both being ~8B models.
Qwen 3.6 27B and Gemma 4 31B are beating ~120B models from a year ago.
Hardware requirements for having genuinely good local models have decreased.
The average wage slave can afford 2x 3090, as opposed to a H100 which costs more than their car.
Deep90@reddit
That also applies to hardware, and we have seen it for models, but hardware is still lacking.
Moscato359@reddit
The moat these companies have is they're buying up all the gpus, so nobody else can have them
read the other day 11% of xAI gpus are deployed?
l33t-Mt@reddit
Anthropic is now using their compute.
Moscato359@reddit
They still need the hardware to be deployed, installed in datacenters, and provided water and power
Having a customer doesn't fix that
l33t-Mt@reddit
https://x.ai/news/anthropic-compute-partnership
Moscato359@reddit
That tells us that anthropic is paying for the deployed compute
But it says nothing about how much of their compute is deployed
zxyzyxz@reddit
That's why they're giving the excess to Anthropic now
Moscato359@reddit
Anthropic needs datacenters to put it in too
Blues520@reddit
Damn so they are stockpiling them
Moscato359@reddit
I've heard google openAI are under 50% deployed
suprjami@reddit
Well, graphics cards are only expensive because-
Ah, that's OpenAI digging their moat a bit deeper.
john0201@reddit
Qwen 27B and Gemma 31 are beating frontier models from 6 months ago. And in a few scores frontier models from 2 months ago.
MarcusAurelius68@reddit
I just cobbled together my first server with a 3090ti and a 12GB 3060 and early results are pretty promising.
En-tro-py@reddit
Slowed as expected... The majority of compute and ram bought up to hoard on closed API tokens...
Raised the $$$ required to absurd heights now to be a new entrant in the AI arms race.
rpkarma@reddit
Its more we can't run the great open weight models on most hardware.
I'm playing with Step 3.5 Flash right now, and it's genuinely impressive (token effiecieny isn't high but its accuracy is surprisingly good for agentic coding and code analysis).
Problem is, I can't run it on my 128GB Spark with MTP: vLLM can't load GGUFs and all the other Q4-equivalents are too big to fit, and MTP support hasn't quite landed in llama.cpp yet
There's other models too that would be so good that need more than most things have: MiniMax M2.7 would be amazing as an open weight model if people could run it
eclipsegum@reddit
Honestly they are better for many use purposes than closed weights. Abliterated qwen models are probably the best general purpose chatbots. No stupid refusals that make you feel like you need to walk on eggshells, and much more truthful discussion of “controversial” topics. Much better Socratic partner
qubridInc@reddit (OP)
A lot of people underestimate how much “UX friction” matters in real-world usage. Even if a frontier model benchmarks higher, constant over-refusal or excessive steering can make workflows feel slower and less natural.
That’s probably one reason local/open models are gaining traction as day-to-day thinking tools or coding companions - people value responsiveness, controllability, and conversational flexibility almost as much as raw intelligence now.
The interesting shift is that “best model” is becoming highly context dependent. For some workloads, alignment strictness is actually a feature. For others, it becomes operational friction.
Double_Ad9821@reddit
And you are not offloading your thoughts to the cloud
aboutthednm@reddit
What's your favourite abliterated qwen in the 0 to 14b range? There's such a breadth to pick from with such different qualities among them, I'm curious what has been giving you the best results. I'm always looking for something new to try out.
The_LSD_Soundsystem@reddit
And best of all it’s private
Big_Wave9732@reddit
I’ve been running Qwen 3.6 35B on a Mac Studio M2 192gb all this week. “Good enough” is a phrase that has crossed my mind several times this week.
I can’t wait to see others release to try and keep up.
qubridInc@reddit (OP)
Honestly feels like we crossed an important threshold recently. A year ago people were mostly benchmarking “can it run locally?” - now the conversation is shifting toward “is the quality gap worth the infrastructure + API cost for this workload?”
The interesting part is that once a model becomes “good enough” for 80% of repetitive tasks, latency/privacy/control start mattering a lot more. Especially on setups like yours where the hardware is already capable of handling serious workloads locally.
Going to be very interesting watching the next wave of open models compete on efficiency, routing, and specialization instead of just raw benchmark flexing.
javatextbook@reddit
Could it run on a Mac mini 64gb m4 pro?
armaver@reddit
M2? Not M3?
Hello_my_name_is_not@reddit
M3? Not M99?
What type of comment is this lol?
armaver@reddit
The comment of someone unsure if such a thing is really possible on an M2?
Hello_my_name_is_not@reddit
Come on, are you really not sure if a device that came out in in 2023 that has 192gb of unified ram can run a 35B A3B MoE model that people have been able to get running on gpu's that came out in 2021 with 12gb VRAM lol? (rtx 3060 by offloading moe to cpu)
If you prompt your local models anything like how you asked that question then the answer is no you will not be able to have a local model like qwen3.6 35 A3B be good enough on M2 or M3.
You can't expect it to read your mind lol you have to put in a bit of effort
Even now I'm still not sure what you're trying to ask. "Is such a thing possible" can be interpreted in many different ways.
Yes there's like 160 GB extra of ram than is needed to load the model so it runs on an M2 with 196gb with no issues.
Or are you asking token speed?
Or maybe about context size?
How about speed once context goes up?
Or maybe youre asking about which quant he's using with the M2?
Maybe how big his project is an what he's utilizing the M2 and model for?
What if this whole time he's been lying and he's actually using two M3's stacked on top of each other with a trenchcoat and they are just pretending to be an M2?!?!
armaver@reddit
I'm sorry you are so angry dude. Here, have a hug :)
Hello_my_name_is_not@reddit
Good 1, and I'm sorry you still don't understand how to ask a question properly
Big_Wave9732@reddit
M2.
sarl__cagan@reddit
Any idea how it does with tool calling?
Kyunle@reddit
I used fixed Jinja template to make it work better 🤷
GrungeWerX@reddit
It’s actually great with tool calling. At first, I was all 27B - and I still am, it’s the smartest - but most of the time lately I’ve been using the 35B because it’s blazing fast.
the_fabled_bard@reddit
It's like 98%+ with tool calling. I think twice it made mistakes in the tool call and then would spiral into making the same mistake over and over again. Bad seed. Clear conversation, start with new seed and problem solved.
Substantial_Step_351@reddit
Think once you look at the mechanism you can understand why the spiral happened. The model isn't just confused by the failed call, it's now reasoning about its own prior reasoning about the failed call, and that's still in context. By retry 2 or 3 it's essentially conditioning on a chain of its own confused output. Clearing state removes all of that, which is why it works. I believe the more surgical fix is truncating failed tool history at the harness layer rather than resetting the full conversation. Full clears fix the loop but then they would lose everything else too
ASYMT0TIC@reddit
If only humans could do this so easily
the_fabled_bard@reddit
You might be right! In my case it was dumb stuff like doubling a letter in my project path and including the doubled letter over and over again in tool calls. Similar to the strawberry problem.
Dumb problem, dumb solution. Better solution might exist but I don't know how. I commit a lot, resetting the convo for a different solution seems like normal living with qwen3.6-35.
Big_Wave9732@reddit
No idea, sorry.
AvidCyclist250@reddit
Surprisingly well. Not as good as 27b but good enough
MerePotato@reddit
Gemma 4 26BA4B replaced cloud models for all but the toughest queries for me. As far as coding is concerned 31B is stellar.
aguspiza@reddit
They are not getting good enough... they are fucking good *today*.
arsenale@reddit
I use max on claude 4.7 because the mistakes that the lower tier version are simply unbearable.
I wasted hours on wrong answers and now I use the most expensive thinking even with simple tasks.
sophlogimo@reddit
They are good enough. I have an A3B Q6 orchestrator model for fast jobs, and a bigger, dense code validator model that I let the orchestrator call with very small context to verify if everything's correct. Works almost as well as Claude did before.
my_name_isnt_clever@reddit
What software scaffold/harness do you use for this workflow?
sophlogimo@reddit
Just opencode, with a subagent definition "call this model for code validation, give it all necessary context". And the subagent simply has the prompt "your job is to check the code for errors, if you need more information to make that judgement, decline approval and say what information you need".
SketchyMitch@reddit
I think the smarter AI companies are already positioning to become the distribution layer of local models as they have no clear moat with how close open source local actually is. Demi’s from DeepMind talks about it in this a 8 mins in https://youtu.be/JNyuX1zoOgU?t=479&si=_D3PdBL8H5I0R_6d
Opening-Broccoli9190@reddit
I believe that the big tech will lobby for laws limiting the use of local models to support their oligopoly on the market.
cosmicnag@reddit
Even if they did, they cant really enforce it because different countries, different jurisdictions.
Opening-Broccoli9190@reddit
Well, EU already has an extensive regulation and is now moving to pass another safety act, further restricting what AI can be used for. The US does multiple protectionist measures already for all sorts of goods and will be able to likewise restrict usage of Chinese or European models in business and government under the guise of protection from foreign interference. The US already passed some really unobvious decisions like age verification even on Linux systems.
Anthropic is already pushing for ethical-only use of AI, OpenAI are known for being rather shady, Google and Facebook are known to use lobbyists and collaboration with the legislature.
I hope you're right though and it never happens
cosmicnag@reddit
They can make it annoying but cant really ban it (too decentralized) , similar to how Bitcoin has survived nearly two decades now.
Kyunle@reddit
I was using qwen3.6-35b q6 for couple of weeks but then switched to qwen3.6-27b q6 with mtp for better quality. Using it for all sort of coding and personal tasks. Don't feel a need for sota backup anymore 😌
chafey@reddit
Local llms have been good enough for these use cases for at least the past 6 months but required substantial hardware to run and a big time investment in operating them. I invested in 2xRTX PRO 6000's earlier this year and have been using Qwen3.5-122b for javascript coding exclusively the past month - haven't had to use a cloud model once. Even better is that it runs at \~180 tokens/second which is much faster than the cloud models! The more recent models like Qwen 3.6 are even better and some people are having luck getting them to run on consumer GPUs like 3090, 4090, 5090.
This_Maintenance_834@reddit
with the recent community work on enabling mtp, qwen3.6-27b can fly on 32GB GPU. 5090 gets you 100TPS with NVFP4. words fly on your screen.
cosmicnag@reddit
I have a 5090, is there a straightforward enough guide to getting that model to rip with at least 100k context?
South_Hat6094@reddit
test123ignore
South_Hat6094@reddit
the gap isn't model quality anymore, it's tooling. routing between local and cloud based on task complexity still requires custom glue code that most teams won't build
ericatclozyx@reddit
Literally just wrote about this — there should really be a progressive enhancement approach to this which is already standard practice for many other software workloads.
Expensive-Paint-9490@reddit
Yes. In 2023 I would not have thought that in three years you could have had models much better than GPT4, running blazing fast on a single 24GB GPU. Or even on 32GB RAM at decent speed.
Hydroskeletal@reddit
if you mean "here's a prompt, go do this long horizon thing and deliver me the 90% solution" - no
if you mean "I can write a program that uses LLMs to do all the inference/judgement things" - yes.
Off the shelf local models now trounce the fine-tuned, custom trained models I had a year ago and it isn't even close.
matjam@reddit
End of year and we will bully cross the threshold of Claude opus 4.6 capabilities and the answer will be a strong yes
Right now I’d say it’s a tentative yes within certain bounds
tredbert@reddit
Wish I could affordably get my hands on more VRAM than the 12GB my 4070 has.
LinkSea8324@reddit
Good enough is dataset related, not architecture related.
Architecture mostly says how fast you can train/infer.
Karyo_Ten@reddit
Copy-pasted post https://www.reddit.com/r/LocalLLM/s/Qdlco4aK2Y
Then they will recommend their service. Pure fake engagement
ihatebeinganonymous@reddit
There is local, and then there is local :)
I am getting the feeling that bigger AI labs are concentrating their efforts on 1T+ (or 500B+) models, seeing them as their only chance to rival frontiers.
Zeeplankton@reddit
This is the first time I'm actually using Deepseek (albeit via API) for work. Which I think is huge. I've been using Qwen 3.6 27b on my m3 max but the real problem is still TTFT, despite streaming tps being fine now.
The problem to me is just opening a provider site takes 1-2 seconds, type query send / answer all can happen less than 10s. Asking Qwen feels like dialup to load in the model.
ptear@reddit
Or frontier is not moving as fast as expected.
tired514@reddit
In my case, local is better than free cloud services as of late.
Granted, the capabilities of expensive frontier models vastly exceeds what my little qwen3.5-122B-A10B is capable of, even with my carefully designed prompts and tool suite, and they're all much faster.
But something seems to have happened to the free services over the past few months. They're getting dumb. Like.. really, really dumb.
Just in the past week I've asked gemini, chatgpt, an my local model (with web-search, sequential thinking, memory, etc) for help with:
In all of these cases the output I got from my local instance was vastly more complete and more importantly: correct. The free cloud models really struggled with figuring out the firefox audio issue in particular, leading me in totally wrong directions. My model found the cause immediately with a link to the bug tracker.
So, at least compared to free cloud LLMs? We're there. Local isn't just good enough, it's superior.
Ok, this is on an evo-x2 I paid $5k CAD for. I could have bought a lot of cloud time for that price. So it's not a fair comparison.
Still, I think it's a preview of what's to come. Cloud providers are doomed in the near-mid future because as AI-competent hardware and 128gb RAM machines start to become standard commodity hardware, fewer and fewer people will be able to justify paying per-token.
My_Unbiased_Opinion@reddit
3.6 27B is maniac (in a good way) as my hermes agent.
aeroumbria@reddit
I think it is not an unreasonable expectation that "large" language model is an inefficient developmental phase of its class of intelligence, and we will continue to chip away the inefficiencies as we develop the technology and explore the theory better. We may see a similar gradual shrinking of "intelligence volume" like we see with physical computers.
Exciting_Garden2535@reddit
Most of the time I throw a design doc carefully prepared by fancy Opes-4.6 to Qwen-3.6 35b, or Gemma-4 26b, and both usually can find critical, or at least very high issues in it. Then throw the feedback to Opus, and it apologizes, and fix. I have paid Kiro from employer, so stick with the Opus from Kiro for creating design docs, but I seriously considering if it ever worth.
Turbulent_War4067@reddit
Still trying to put together my financial research/portfolio mgmt stack. From what I can tell so far. Models are smart enough. Models are too slow. The pieces are not ready for primetime. And a well integrated search is a big problem. I can do a lot on a cloud based frontier model without an extensive RAG. Ddoing the same thing needs a local model seems difficult.
AvidCyclist250@reddit
Have you taken a look at nous hermes with Obsidian? Not sure if that's viable for you of course. But maybe the harness has other skills you could use to search your documentation
Turbulent_War4067@reddit
I haven't, will do so. Thanks for the tip.
ortegaalfredo@reddit
GPT-5.5-nano is already good enough for almost everything.
Qwen3.6-27B is 10X better than that. Not even mentioning Deepseek-v4-flash.
a_beautiful_rhind@reddit
Imo:
“What’s the smallest model for the workload?”
But I think generalist models themselves are plateauing. Local can definitely cover well defined tasks like you mentioned. It's literally what they are training for.
Regular people aren't going to faff with any of this and just use the cloud as cheaply as possible. Can't get the hardware, don't want to bother with the software. Businesses and enthusiasts are who will use them.
Calm-Republic9370@reddit
I just wrote an app that does projections for real estate, a bunch of complex formulas, based on data in another language that I don't speak. I'm not saying a different programming language, a Slavic language. I had to translate the request into English so I could properly have the model design it. I output it and sent back to the end user. Qwen 27b is great.
brown2green@reddit
Even for my basic but somewhat niche coding needs I still have to use Gemini 3.1 Pro.
I have no idea if larger open-weight models than what I can use within 24GB of VRAM can to compete. I'd say local models are being held back by artificial memory / memory bandwidth bottlenecks (i.e. costs).
Silver-Champion-4846@reddit
Even the frunteer ones sometimes struggle especially if you're doing unconventional experiments
darktotheknight@reddit
100% this. You're building a web UI for your favorite CLI app or a website for selling framed cat pictures? Yeah, these models will help you out. You're reverse engineering a Windows driver, so you can run your USB device on Linux or you're porting Coreboot/OpenBMC to a server mainboard? Good luck with that, your cat will give you more support than even the frontier models - at least emotional support.
Silver-Champion-4846@reddit
The only support you'll get from your frunteer llm dog is "You're absolutely right"
Radium@reddit
Why do you think they chopped the RAM leg off with all the investment money?
Miriel_z@reddit
Every time I say something like "No way it's gonna happen", it happens next week. I learned to keep my mouth shut and prepared to be amazed. In short, it should happen.
ThinkExtension2328@reddit
Sir say the thing!! People need this
Miriel_z@reddit
Fine! No way we will have sane government and honest companies who do not treat us as disposables and let us have what we need. RAM, GPU, privacy, security, and much more. Let's see now.
ThinkExtension2328@reddit
And now we wait 👌
RemindMe! 7 days
RemindMeBot@reddit
I will be messaging you in 7 days on 2026-05-14 22:40:07 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
iamapizza@reddit
Could you by any chance express no way at the price of ram and gpu coming down greatly soon.
Miriel_z@reddit
Now way it will go up again!🤣
JuniorDeveloper73@reddit
i cant believe we have something Q3.6 27b to play,