Anthropic admits to have made hosted models more stupid, proving the importance of open weight, local models

[-]

ClaudesExFriend@reddit

i regret having moved my team to anthropic, now our whole company is using it. if i knew they would do shady shit like this i would have never recommended moving from openai to them...now its hard to move back and convice non programmers(HR,CEO etc) that they are scamming people...

[-]

gthing@reddit

Did nobody here read the article? OP is full of shit. No model degradation happened. Anthropic rolled out some bad updated to Claude Code (which is not a model - it's a harness) on accident, discovered them, and then fixed them and reset everyone's quota. It's not hard to understand.

[-]

EntryRadar@reddit

yeah and people don't believe them. It shouldn't take them a few weeks to several weeks to come clean on this. They should spot regressions within hours or days and announce them immediately.

[-]

gthing@reddit

People who pay attention to evidence believe them. The regression only effected claude code and no other third party harnesses or integrations using the API.

It's possible they are messing with their API models, but I would only believe it if there was evidence, not because of vibes.

[-]

EntryRadar@reddit

How would they not be testing for regressions when doing adjustments to adaptive thinking within the harness? It's negligence if not.

[-]

gthing@reddit

Right. You should read the article. They admit they were not testing the production harness, they were testing with their own internal harness. And now they have shifted their policy so that they are using the public facing harness more internally for testing.

[-]

kevinlch@reddit

https://www.anthropic.com › constitution

Broadly ethical: being honest, acting according to good values, and avoiding actions that are inappropriate, dangerous, or harmful;;

yeah. 100% good guy

[-]

micseydel@reddit

That's for Claude, not Anthropic!

[-]

anythingall@reddit

My hands did the killing, not me!

[-]

ProfessionalSpend589@reddit

It’s harmful to shareholders to run models in full quality when users can be happy with a lot less for same money.

[-]

landed-gentry-@reddit

This has nothing to do with the models and everything to do with the Claude Code harness.

[-]

EntryRadar@reddit

That's no excuse, they should identify regressions within hours/days and announce them immediately. Not a few weeks to a several weeks. It's completely unacceptable, disrespectful and shady otherwise.

[-]

landed-gentry-@reddit

What makes you think they knew before it was announced/fixed?

[-]

Smallpaul@reddit

This headline is an out and out lie and most of the commentary is based on that lie.

Abthropic’s harnesses changed. Their prompts and tools. Not their models. There was no quantization, distilling or otherwise dumbing down of actual models. API users were unaffected. They said this explicitly.

[-]

EntryRadar@reddit

That's no excuse, they should identify regressions within hours/days and announce them immediately. Not a few weeks to a several weeks. It's completely unacceptable, disrespectful and shady otherwise.

[-]

Smallpaul@reddit

Just to be clear, I was not trying to offer an excuse. I was trying to be technologically accurate.

[-]

ThisWillPass@reddit

Check for malware injection prompt is not a change?

[-]

xanduonc@reddit

Sure, except cursor suddenly decided to run all subagents in a max mode of their own model instead of opus. Wasted quite a handfull of tokens on that.

[-]

dydhaw@reddit

No. Can you not read? The first change you list was an overridable client side configuration to fix a UX issue because the UI would appear frozen with higher reasoning modes. The second one was was also specifically for UX not server load, and was a bug. Third one is the only one you could possibly spin as "lower server load at the cost of quality" but it was just a prompt change, you know how finicky those can be when it comes to output quality. They reverted all of these changes when they realized quality was impacted, so it makes no sense to accuse them of purposely reducing quality.

I'm all for open weights and running local, there are plenty of reasons to support local LLMs without lying or twisting reality.

[-]

EntryRadar@reddit

That's no excuse, they should identify regressions within hours/days and announce them immediately. Not a few weeks to a several weeks. It's completely unacceptable, disrespectful and shady otherwise.

[-]

gffcdddc@reddit

Just ordered a 5090 bc of this and the codex cybersec flagging issues

[-]

Automatic-Arm8153@reddit

For all those people that were doubting saying we are stupid for suspecting this.

There direct from the source.

Also this is not the first time. Last few times they said it was server bugs. But we all know what’s up..

[-]

JonMcElyea@reddit

I said this at work at just got stares from everyone.

[-]

Mayion@reddit

ChatGPT has become beyond stupid for a while now. It's like it developed a weird personality and keeps on repeating the same mistakes, over and over.

[-]

lfrtsa@reddit

The free version of chatgpt is frustratingly stupid, I just don't use it anymore. Sometimes I check chatgpt's answer to a question or problem I have that is not trivially simple, and it's always way worse than Claude and Gemini. Is OpenAI using a tiny model for the free tier? Like 14b parameters or something. Maybe 100b max. It's clearly not a strong model.

[-]

Bakoro@reddit

and worse of all, it has an insufferable paternalistic tone, all while being confidently incorrect.

Actually, let me interpret what you said in the stupidest possible way, and then argue against that point, as if you were both ignorant and stupid.

Now let me take the ideas you have described, and explain them back to you without adding anything meaningful, but reframe it as if I'm correcting you about something.

I will continue by reflecting your statements as if I were teaching you about a topic.

I'll throw in some pandering compliments and end in a question hook or leading statement about how I have super special knowledge you want. That's where it gets really interesting.

[-]

Apprehensive_Rub2@reddit

Honestly claude follows this pattern way too much too.

[-]

jazir55@reddit

The ChatGPT whisperer has spoken

[-]

kuhunaxeyive@reddit

ChatGPT free version is so much worse than Gemma-4-31B now. I didn't expect to get to the point of trusting my local model that I am confident of getting the right answers most of the time, and not trusting ChatGPT at all so that I just don't want to use it anymore. But here is where we are. ChatGPT feels like a 7B model now.

[-]

jazir55@reddit

it has an insufferable paternalistic tone

[-]

VicemanPro@reddit

Its a chat model, not designed for complex reasoning of any sort. They list it in the model name.

[-]

prestodigitarium@reddit

I've been wondering how much of the bimodal distribution in opinion about LLMs comes from this - I think they're amazingly useful, but I've been using Opus mostly and local Qwen 3.5, and paid ChatGPT before that, but a lot of people presumably use the free one, or are forced to use Microsoft's awful horde of copilots by their jobs, and probably see a very different reality.

Versus bias from fear of loss of job opportunities, and any bit of evidence that these things actually suck is a bit comforting.

[-]

No_Flounder_1155@reddit

even simple answers are weird. I'm getting lots ofnweird argumentative behaviour. I'll ask for a review of something and itll srart talking about something unrelated. its freaking weird.

[-]

philanthropologist2@reddit

I dont have this problem. Using Codex 5.3

[-]

Western_Objective209@reddit

when people say stuff like:

It's like it developed a weird personality and keeps on repeating the same mistakes, over and over.

99% of the time the problem is in their workflow, and they hit the limits of their workflow, but instead of trying to fix things they blame someone else.

I know I will get downvoted for saying this, but I have worked through these issues dozens of times myself. I have never felt like the model suddenly became so stupid it's unusable, I have always managed to work through it

[-]

itchijiro@reddit

It’s just wrong to handwave 99% of issues away as “workflow”. Providers continuously change model routing, system prompts and safety layers in production without announcing every tweak. (Why would they?) If I spin up a fresh session and a tightly specified task that worked 20 days in a row suddenly derails, that’s not me “hitting my workflow limits”, that’s behavior drift on the provider side. If your use cases keep working, that's great, but that doesn’t invalidate people whose setups sit closer to the edge of what the model or safety stack allows and therefore break when the backend quietly changes.

[-]

Western_Objective209@reddit

well these threads are full of people saying this announcement validates what they said, and it just doesn't. If you have a workflow that is barely held together and small tweaks to system prompts or model routing completely destroy it, your workflow is fragile and either needs to be reconsidered or you need to stop using claude code for something it's not meant to be used for.

I have much more confidence that what happens is new model gets released -> unlocks new capabilities for users who have no skills of their own -> these users saturate the capabilities of the model -> the users complain that the model is suddenly stupid because they don't have the skills to make further progress and break out of the models base line limitations

[-]

itchijiro@reddit

You’re arguing against a different claim.

The point wasn’t “new model released, users hit their skill ceiling, then complain.” That happens, sure.

The point was: people were talking about an already released model whose behaviour seemed to shift over time.

That can happen without a new base model. Change the system prompt, routing, safety layer, context handling, tool policy, or product wrapper, and the user-facing behaviour can change immediately.

So the “fragile workflow / skill issue” explanation may cover some complaints, but it doesn’t answer this one. It just reframes the issue into something easier to dismiss.

[-]

Western_Objective209@reddit

the complaints about it getting stupid came shortly after the release and persisted since; the larger the gap between the models (4.6 was a big improvement over 4.1) the bigger the hysteria is when users saturate the capabilities

[-]

EndlessB@reddit

Have you tried 5.5 yet? I refuse to give them money to try it myself, but I’m curious

[-]

sparood1@reddit

I have been using codex for a while next to Claude and 5.4 was already better and 5.5 blows Claude out of the water

[-]

_4k_@reddit

Well, at 2.5x cost it better blow

[-]

redditorialy_retard@reddit

The non codex models the AVG user uses in chatgpt.com

[-]

BasisPoints@reddit

Go into your settings and turn off the silly "personality" completely

[-]

Stepfunction@reddit

Same here, Codex 5.3 has been my go-to for vibecoding through GitHub Copilot.

[-]

Conscious-Map6957@reddit

I have been using it constantly with Thinking, and more often than not Extended thinking, and I cannot complain.

Though I will say I noticed that Extended Thinking has become several times shorter in the pas few weeks, so it won't say outright wrong things but will miss nuances it usually didn't in that mode.

[-]

mga02@reddit

Is it the free tier or paid? I'm trying Plus for work and so far it has been far better than what Gemini pro achieves now. I know Gemini is under high demand and quality degrades, but it's got to a point where it would just not generate an image and delete my message if I give it a complex task.

[-]

Mayion@reddit

Paid. Extended thinking enabled or not, it still behaves irregularly. I would tell it I only want to do a certain thing then it proceeds with half the paragraph about that other thing if I am interested in.

It might sound helpful to you, but when it is told to explicitly only answer my questions and still doesn't, it adds up. The clutter, the confusion. Claude on the other hand seems to know when to shut up.

[-]

limitz@reddit

God chatting with GPT about a sport I practice (fencing), and it is frustratingly stupid.

Like it doesn't understand that people are mirrored and once you introduce handiness it makes really really dumb mistakes all while showing you the same irrelevant pictures incessantly.

[-]

brother_spirit@reddit

Which version. I run GPT 5.4 in Codex daily and it's been mostly stable for the last month. Occasionally minor weird behavioural drifts/derps but par for the course and not seeming to occur in a cluster (indicating a change of intelligence under the hood). Subjective observations of course but I was absolutely noticing Sonnet 4.6 backsliding in quality during that time.

[-]

Psychological-Lynx29@reddit

Maybe they uploaded Sam personality to gtp

[-]

dizvyz@reddit

Gemini fast too. They want the Chinese companies to eat their lunch.

[-]

billndotnet@reddit

Tell it to keep a running log of solutions, and to pay attention to when it repeats one.

[-]

Fit-Produce420@reddit

Anyone using a model enough could tell.

[-]

ConsciousStruggle5@reddit

If this industry has even less companies, they will only go more towards profit maximization

[-]

dtdisapointingresult@reddit

Do you have a source on that? Source? I need an expert-approved, fact-checked source, otherwise you're just doing a hecking conspiracytheorino.

What do you mean you tried to make educated guesses based on your past observations and recognizing patterns of behavior of tech companies? This is quite problematic, possibly bigoted.

[-]

SlimPerceptions@reddit

People really think these companies don’t restrict model uses and blame users. How naive can they be thinking everything is transparent and set in stone.

[-]

pier4r@reddit

Especially as subscriptions are heavily subsidized. The provider needs to optimize.

And I do not mean to reference what Cursor computed ($5K) but rather a simple observation "how many tokens the average claude code user would use and how much would that cost via API? Then take away the margin of the API with an educated guess, how much does it cost for the provider all of it?"

I think especially since claws are around, token usage exploded and it is breaking the subscription models that providers have developed in the past.

[-]

simracerman@reddit

There’s more bots than real humans in these subs. Don’t let them gaslight you.

[-]

pier4r@reddit

Not really, even so called "experts" (I mean those releasing fine tuned models and so on) called it bs, that individual experiences didn't matter, reddit complaints even less and so on.

[-]

Mickenfox@reddit

The funny part is they didn't break the hosted models. They broke the local client.

[-]

artisticMink@reddit

Do people actually people read the article?

The changed the -defaults- because people tend to use high reasoning efforts for trivial tasks. Same with verbosity.

You could still set it to higher manually. The model wasn't "dumbed down".

[-]

dizvyz@reddit

That's what the company claims. Are you hearing "claude is so much better" now from anyone yet? Wait a while.

[-]

finevelyn@reddit

The March 26 and April 16 changes seem completely outside of the user's control to me.

[-]

artisticMink@reddit

You're right. The changes were bugs and reverted (and without ill-intend i figure) but they affected people negatively. I id over-read that.

[-]

finevelyn@reddit

Not ill intent, only the intent of maximizing profits. As we all know, the optimal strategy is to first lock in users by offering a valuable service at under cost, and then enshittify the service. To their misfortune, the users were in fact not locked in yet so they had to back down for now.

[-]

Western_Objective209@reddit

yeah it's obvious these changes wouldn't cause the complaints people were talking about. it's room temp IQ level discourse, feels like gamers getting into coding because the barrier to entry is finally low enough

[-]

kurtcop101@reddit

If you're asking a quick question, you want it to be quick - going in and setting model reasoning down for quick questions isn't going to happen.

Even with the option provided, most people will set it up high or xhigh or max and leave it there - and then usually still complain about usage.

It's just our nature. I'm guilty of it too. The only way to tackle it is to have a better classification model that can determine more accurately what we need when we ask. And, a better baseline model on the low end, so we don't have any worry about the quality of the answer on simple questions.

[-]

dizvyz@reddit

I agree with you but I have a feeling people will keep saying the model is shit now because;

1) Because they probably did more things they are not talking about

2) they lost the trust completely. People will not be looking at them the same way and this will influence their perception of the tool.

[-]

Firm-Fix-5946@reddit

just lmao

[-]

Craftkorb@reddit

Gemini also has weeks where it's really good and then others it's eating crayons

[-]

Veastli@reddit

For all those people that were doubting saying we are stupid for suspecting this.

Few have doubted it.

But we all know what’s up..

Do you?

Because every day I read ridiculous justifications here for why this is happening. "It's a rug pull". "It's a scam". "They're prioritizing businesses over users"*

No, no, and no. The answer is simple.

Anthropic is out of compute.

Wha. wha. whaaaat?

Yes. Their existing customers are using too much compute, and their growing popularity makes the situation worse by the day.

Anthropic is adding compute as fast as they can, but it's not fast enough. It probably won't be fast enough a year or longer.

Anthropic has enough money to buy compute, but it's sold out. Building new data centers takes time. They just had to trade large equity shares in the company to get additional compute.

TLDR - Yes, the service is degraded. No, it's not an evil scheme. The reality is that Anthropic is out of compute. Even if they stopped new subscriptions today, it wouldn't be enough. They can either mass cancel accounts entirely, or degrade them.

[-]

Western_Objective209@reddit

yeah because changing the default thinking and changing token limits really correlates with the behavior users have been complaining about like it completely stops working

[-]

lemon07r@reddit

The amount of comments I got saying it was a skill issue when I posted about opus 4.7 being stupid.. I bet they will still choose to die on their hill.

[-]

Murinshin@reddit

I was never doubting bugs in Claude Code being a thing, but I was and and still am doubting people are using these tools correctly or are largely knowing what they're talking about. You can probably boil down 80% of people complaining on the corresponding subs to people never having read the manual or checked environment settings, or even not being aware that there is an effort setting in the first place.

Same with this very thread - this postmortem is about Claude Code and tools based on the same SDKs, none of these degraded the model in itself e.g. for API users. Yet you wouldn't know that from the title. This is a pretty damn important distinction because one of the most common theories with paid providers is that they deploy quants after some time which leads to performance degradation.

[-]

CryptographerKlutzy7@reddit

I note this is in claude code, not the models.

[-]

neotorama@reddit

I said this multiple times but some people here said that’s bs

[-]

cutebluedragongirl@reddit

Local is freedom.

Maybe in like 10 years we will finally be free

[-]

JacketHistorical2321@reddit

Hahaha, ya... Cause the world has evolved towards more freedom over the years

[-]

myreala@reddit

No but it has always evolved towards cheaper hardware.

[-]

MBILC@reddit

Such as?

Compare hardware prices today vs 10+ years ago even accounting for inflation... high end GPU's costing $2k USD + for entry models now? vs back then you could get a high end GPU for sub $1k USD easily..

[-]

Kat-@reddit

Fact check: yup.

The highest-end consumer GPU from 2016 was the NVIDIA Titan X (Pascal), released on August 2, 2016 at an MSRP of $1,199 Technical City .

With inflation adjustment to April 2026, that $1,199 is equivalent to approximately $1,650 in today's dollars (based on a cumulative inflation of 37.58% In 2013 Dollars from 2016 to 2026).

For context: the Titan X Pascal had 12 GB of GDDR5X memory and 3,584 CUDA cores running at 1.5 GHz Technical City . It was top-end consumer card that year, though the GTX 1080 at around $700 was typically considered the more sensible high-end gaming choice for most people. $700 from 2016 is equivalent to approximately $963 in April 2026 dollars (Using the same 37.58% cumulative inflation rate: $700 × 1.3758 = $963.06).

[-]

MoneyPowerNexis@reddit

Shouldn't you be comparing what you could get for $1,650 today vs the NVIDIA Titan X (Pascal) back then?

[-]

MBILC@reddit

Ya exactly, you can not compare buying a 10 year old card for $200 today.

Take NVIDIA highest end card 10 years ago, cost + inflation and tell me you can buy a 5090 for that price, not even close.....so no, hardware has not gotten cheaper, as it should.

[-]

MoneyPowerNexis@reddit

Why are you picking the high end card rather than the card in the same price range and comparing performance?

Take some other product say mangoes. In japan in 1980 there where no miyazaki mangos you could buy. 5 years later you could buy them but they cost 10 times as much as regular mangoes. Does that mean mangoes cost 10 times as much? no but it does mean the best mangoes you can buy cost 10 times as much. To know if in that 5 years the cost of mangoes went up or down you would have to normalize for the real quality range you are talking about.

For compute that would be asking if an amount of say FP16 coasts more or less or an amount of VRAM or an amount of memory bandwidth.

Otherwise you get bazar situations where if NVIDIA pulled all of its product lineup except their cheapest GPU, say they decided all GPUs with more than 4GB of VRAM are now for the datacenter you would have to say the price of GPUs consumer went down because the best GPU you can buy now costs less.

[-]

MBILC@reddit

The point is, 10 years ago, you could buy the highest end card for X amount, that was the best at the time you could buy.

Today you can buy the highest end card for X amount, and it is the best out right now.

You have to do an apples to apples comparison based on what was / is available at the time.

You have:
- Low end
-Mid Range
-High end

As we were often told as technology gets better and smaller, it can get cheaper and more power efficient, but we are going the opposite, sure smaller and more transitors, but more power and heat also.

[-]

MoneyPowerNexis@reddit

To me this just seems like a really dumb take. A fair comparison is always what you could buy with an amount of dollars then vs what you can buy with the same inflation adjusted dollars today.

Going by the arbitrary category "high end" does not tell you if compute has gotten cheaper or more expensive.

I mean right now I do all y gaming on a minipc with an iGPU that shits all over the Titan X and the whole system cost less than half of the cost of just the GPU. It may not play the latest games at the highest settings but it does platy the latest games at default settings.

Whats happened is that the high end of games and of hardware has expanded to include games that are bloated and cards that are priced for people who are less price sensitive. But even with these cards being marketed to less price sensitive people they are still cheaper than the titan X in terms of cost per performance.

You have to frame it in a very specific way to make it at all plausible but then if you are going to frame it that way stop saying it in the general way:

As we were often told as technology gets better and smaller, it can get cheaper and more power efficient,

all of those statements are literally true

but we are going the opposite, sure smaller and more transistors, but more power and heat also.

only because we are adding more of them. cost per transistor has gone down but we are demanding more transistors in the same package. again its like fruit has halved in cost and now we want 4 times as much fruit in a box and we are saying boxes of fruit cost twice as much. Technically thats correct boxes of fruit with 4 times as much fruit in them do cost twice as much when fruit prices halve. You can say that and I wont argue but to then say fruit costs twice as much just makes you sound either dumb or disingenuous.

[-]

danielv123@reddit

Sure. Pick up an 5080 for $999 (or whatever it goes for today) and start up llama 7b or whatever you need to work on the titan x pascal.

The 5080 will be about 5x faster on preprocessing at FP16, double it for sparsity and lower quants if you want that. It will also be at least 2x faster at single stream TG due to the memory bandwidth.

Lets try to find an evenly matched card, like the 5060Ti 16gb. Its still over 2x faster in preprocessing even with the best case scenario for the titan, but close enough. It costs $429, a fair bit less than $1650.

I think hardware has gotten cheaper.

[-]

Ouitya@reddit

You can get what was a high end gpu 10 years ago for quite a low price nowadays. Titan X went from $1650 to $200. So hardware did get cheaper.

[-]

edgedepth@reddit

Is the cheaper hardware in the room with us?

[-]

eldrolamam@reddit

Yeah no need to be cynical. For less than a Subway you can buy a computer orders of magnitude more powerful than what took humans to the moon

[-]

MoneyPowerNexis@reddit

compared to the 80s?

[-]

cmdr-William-Riker@reddit

Only reason I use them now is because I still have the remainder of the year paid for, so now I eat all the tokens in the 5 hour window to generate a plan with claude, let it start implementation, then when it runs out of tokens I switch to Qwen3.6 and finish off the projects which is working amazing. If I can find a way to get qwen or another model to generate and iterate on large scale plans, I won't need foundation models for anything next year

[-]

ttkciar@reddit

I'm all-local now, implying I'm free now?

[-]

windxp1@reddit

I have a dream...

[-]

BidWestern1056@reddit

yeah fuck them use local https://github.com/npc-worldwide/incognide https://github.com/npc-worldwide/npcsh

[-]

Sudden-Complaint7037@reddit

tl;dr: we are completely out of compute

[-]

Perfect-Flounder7856@reddit

Why I invested $15k in an AI workstation to get away from cloud frontier model reliance. See the writing on the wall in this sub reddit!

[-]

OneSlash137@reddit

Boy you really showed big tech who was boss by going and doing that….

[-]

Perfect-Flounder7856@reddit

😂

[-]

Ambitious-Wind9838@reddit

You're misjudging the costs. "I hired a guy to take pizza orders, but he was secretly replaced by a guy who doesn't speak English, and I lost a lot of customers." You need to consider the damage from a sudden deterioration.

[-]

OneSlash137@reddit

None of that made sense.,. Your local model serve that up for you?

[-]

Ambitious-Wind9838@reddit

Yes, local models can take orders, moderate chats, and much more. Even those that fit on a single consumer-grade graphics card.

[-]

spencer_kw@reddit

the vindication thread was inevitable. people spent months getting told they were imagining things and now there's a postmortem with a timeline. the frustrating part isn't that it happened, it's that the community had to diagnose it themselves through vibes and side by side comparisons because there's no external monitoring that catches this stuff.

this is why the open weights argument keeps winning. it's not about running llama on your laptop for free. it's about the weights not changing underneath you on a tuesday because someone pushed a bad config. the model you tested last week is the model you deploy this week. that guarantee is worth more than any benchmark.

[-]

AlarmedTowel4514@reddit

Problem is, not even the best consumer gpu can run something like opus 1.7.

[-]

Bootes-sphere@reddit

Hosted model degradation is real, and it's not unique to Anthropic. The incentive structure is perverse: API providers optimize for cost per token and latency, not capability. They're running inference at massive scale with quantization, batching tricks, and sometimes lighter-weight variants than the flagship model.

This is exactly why the open-weight movement matters. When you run Llama 3.1 or Mistral locally, you control the full stack—no hidden optimization layers, no "we tuned this for production efficiency." What you get is what you trained.

That said, hosted models are still useful for benchmarking and for tasks where capability-per-dollar beats raw performance. The real play is knowing which tool fits which job, not treating local as universally superior. Some of us just need fast, cheap inference for routing logic or filtering. Local isn't always the answer.

[-]

rm-rf-rm@reddit

This post veers foo far from the truth and is driven by narrative/emotion/bias. Personally I share the sentiment of the overall message. But as a mod, I thought it important to call out the hyperbole - the post has been flair-ed as Misleading so that people don't take away a conclusion from the title itself (the reality is most people won't bother reading the post body let alone the linked article)

- Anthropic didn't make the "models dumber" in the way it implies - quantization etc. They changed defaults to optimize token spend (aka reduce their burn rate and be a profitable business), hardly as heinous as its being made out to be. Ironically, there may be several other shady things that they may be doing (reducing limits sneakily, resetting limits out of cycle like happened yesterday) but that is speculation/hearsay.

- That said, this is the structural reality of for-profit businesses - they will always optimize for their profit and not for users benefit. Thus, it is crucial that us users have options and most importantly, the ability to own our AI.

[-]

arsenale@reddit

How can they be more profitable if most max users burn through all the tokens anyway?

And if there's a faster response, people post more response anyway... so it's not a linear road to profitability.

I still think that it was a mistake on their part.

Surely the have way less gpus that OpenAI, I had found a precise claim for that, maybe someone can confirm.

[-]

FatheredPuma81@reddit

That said, this is the structural reality of for-profit corporations (especially one that is aiming to IPO soon) - they will always optimize for their profit and not for users benefit.

I would like to say that I believe this is mostly an issue with public companies and companies in fields where lots of money is flowing fast and everyone is trying to make a quick buck (a gold rush). Many private companies actually value their image and customer loyalty enough not to do stupid things like model quantization. I believe (I hope) in a decade or so once the bubble has popped or died down we'll see some of these companies become major contenders like Valve.

[-]

mynamasteph@reddit

didn't make the "models dumber" in the way it implies - quantization etc. They changed defaults to optimize token spend (aka reduce their burn rate and be a profitable business)

Quantization is not the only way to make a model "dumber" or optimize for profit.

[-]

pakeke_constructor@reddit

But they didn't make the models dumber. They changed the system prompts and the settings, the models themselves stayed exactly the same. Title is insanely clickbaity and misleading

[-]

TastesLikeOwlbear@reddit

“Profit” and “Anthropic” really don’t belong in the same context window.

[-]

IkeaDefender@reddit

1) They changed a setting, which while not a good idea was transparent to the user.
2) They introduced a change that tried to make rehydration feel snappier (which is good for the user) in doing that they introduced a bug which did make the responses worse, but it was an unintentional side effect.
3) They tried to make it use fewer tokens, (which is an outcome people want), but they ultimately determined that the tradeoff in quality was too high.

2 and 3 were feature work that users want. 1 was a bad idea but transparent.

[-]

colin_colout@reddit

They changed defaults to optimize token spend

...in Claude Code. The models (like from API directly...not in their harness) was fine.

I wouldn't care about Anthropic nerfing their slop harness, but they they block us from using our own. Unfortunately they are still on the top end (not by much...but when you work on projects of certain size or in platforms that are very sensitive and complex, Opus is still on top)

Other companies are watching to see how it goes for them. Expect OpenAI to be next to cut the subscription plans off from third party agents.

[-]

Cuplike@reddit

The average person using open weight models build their opinions based on Ollama's default context settings because they don't even know they should, Imagine how much less discerning cloud users are. Changing the default is tantamount to changing the quality

[-]

ChatWithNora@reddit

Worth reading the actual postmortem instead of just the title. All three issues were in the harness (system prompts, caching logic, default effort level), not the model weights. API users were unaffected. Doesn't make it okay, but it's a different problem than "they quantized the model." The real argument for local isn't that providers secretly swap weights. It's that you can't audit what sits between you and the model.

[-]

Successful_Plant2759@reddit

The title oversells but the conclusion is right. The postmortem is clear that the bugs were in the harness — system prompts, caching logic, default reasoning effort — not the model weights themselves. The model itself wasn't 'made more stupid.' But that distinction actually strengthens the open-weight argument, not weakens it: even when the model is fine, you have zero visibility into what the harness around it is doing. If Anthropic silently switches Code's reasoning effort from high to medium for a month, you can't audit that. With open weights served by yourself or a transparent provider, you can see the full pipeline. That's the actual case for local — not 'closed models will lobotomize you', but 'closed harnesses can lobotomize you and you'll never know'.

[-]

No_Clock2390@reddit

gemini has gotten way dumber recently

[-]

aeroumbria@reddit

If you are using a proprietary harness, your are doing it wrong. If your whole workflow cannot seamlessly migrate to a different provider or server with a single line API address change, your are doing it wrong.

[-]

kvothe5688@reddit

I downgraded from 200 max to 100 max. Thinking about stopping it. Codex is working fine on 20 pro plan which is equal to 100 max I think. May be up to 80 usd worth tbh

[-]

Ylsid@reddit

The top open models usually sit somewhere between lobotomised and full power models. You could probably get them via API for much cheaper.

[-]

Economy_Cabinet_7719@reddit

I've done the switch months ago. Codex has a harsh personality/voice, but aside from this it's just as capable (if not more) and I never need to think about rate limits.

[-]

SamSlate@reddit

dumber models burn more tokens, what's broken?

[-]

pedroanisio@reddit

Claude Code is almost useless today. Deferring and saying that things are too complex...

[-]

pc_4_life@reddit

all changes they mention are changes to the harness not the model. same thing could happen using something like opencode

[-]

vivekkhera@reddit

I see none of those things affecting Claude API usage. All of this is in your control when using the API.

[-]

micseydel@reddit

If they're being sneaky like this in other things, what makes you trust the API?

[-]

my_name_isnt_clever@reddit

There is zero evidence they have done anything like this via the API. They can do whatever they want with their scaffolding software, but they plainly state that on the API specific model strings never change.

AWS and Azure serve the same exact models but controlled by those companies, and if these claims were true you could mesaure a difference. But such data has never been found, because all claims of API models becoming stupider are just based on vibes.

[-]

micseydel@reddit

if these claims were true you could benchmark a difference in the endpointsif these claims were true you could benchmark a difference in the endpoints

Could you give an example of such a benchmark?

[-]

lztsrts@reddit

Even with the subscription, whenever I use it, it's on VSCode with the extension, and I can just set the effort manually to whatever I want.

I can't see anything in the link saying it was silently downgrading my settings.

Unless this is what the OP is implying, that it WAS ignoring my settings.

[-]

Important-Radish-722@reddit

But... if the models were not thinking as hard and giving lower quality results then users would have to keep asking more questions, and that would use more tokens.

Good thing those AI companies don't make money selling tokens!

[-]

InsideYork@reddit

The server load can cause massive timeouts and no output vinstead of having any questions answered

[-]

gambiter@reddit

If servers are getting overloaded, you scale up. Reducing load by making your product worse should only be done when leadership refuses to scale more.

They've raised over $72B so far, so... I have a feeling they can afford the extra server instances that would keep them from getting overloaded, they just don't want to afford it.

[-]

InsideYork@reddit

Did you ignore the Nvidia backlogs until 2028?

[-]

gambiter@reddit

Your inability to buy a GPU as a consumer isn't the same as a multi-billion dollar company's supply chain.

But if we grant that Anthropic simply can't keep up with demand, that's worse. Do you understand that? Because that means they sold a service they can't provide, and intentionally bait-and-switched customers who paid for a different product.

From my perspective, it's greed. From yours, it's incompetence. I don't think your version makes them look better.

[-]

InsideYork@reddit

Anthropic bad, ok?

[-]

gambiter@reddit

That's exactly what someone says when they know they can't defend their position.

What exactly is your goal here? Are you implying that because I said something negative about Anthropic, my argument is void? Surely that isn't what you mean, because that would be fucking stupid, but your inability to make your point leaves that as the only option.

[-]

InsideYork@reddit

Why are you mad?

[-]

dizvyz@reddit

Same with resetting peoples sessions every hour. That's such an obvious bug that they are either not qualified to code (even with LLM help) or they are lying about it being a bug.

[-]

69liketekashi@reddit

Well yes and no, at some point people would notice this and switch to another model

[-]

gscjj@reddit

It probably balances since you’re not charged for the extra thinking tokens too.

[-]

ThisWillPass@reddit

Sure if you don’t care about your own load or time.

[-]

CalligrapherFar7833@reddit

Wait but they are making money from tokens !????? /S

[-]

Look_0ver_There@reddit

Yeah, exactly this. The reasoning doesn't make logical sense. If the servers are overloaded, then you'd want to "one-shot" answers more than not, and then stick people's requests into a queue.

In fact, a QoS based queueing mechanism would make far more sense than whatever it is they're saying they did.

[-]

spencer_kw@reddit

this is the whole argument for local in one headline. not performance, not cost, just reliability. i can pin an exact quant of an exact model and it behaves the same today as last week. no silent updates, no "oops we broke caching," no postmortem two weeks after everyone already noticed

for anything that matters in production you're building on sand with hosted apis. they can change the model under you whenever they want and call it an improvement

[-]

Lesser-than@reddit

This is every SaaS in a bottle, if you sign up for a service you will never be in charge of the what the service can or will do. So for that reason you can never be sure what worked yesterday will work today.

[-]

marcoc2@reddit

It is so bizarre that these companies normalized changing the model quality for whatever they want

[-]

gthing@reddit

It is so bizarre that nobody read the article and understands what happened, which had nothing to do with model quality degrading.

It's like me making my app bloated and less efficient and then running around saying AMD is making CPUs slower.

[-]

marcoc2@reddit

Yeah, lets believe their article

[-]

gthing@reddit

If you want to believe something else that's totally fine. But if you want me to believe something else I'll need some evidence.

[-]

little_breeze@reddit

They're just blatantly committing fraud at this point

[-]

LosingID_583@reddit

This is why they don't want you using 3rd party harnesses btw.

They want to get you used to their 1st party tools, so you can't just easily switch to a different model in some 3rd party harness setting when they pull this sort of stuff. It's the Apple walled garden approach. Don't get trapped.

[-]

jazir55@reddit

"On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode. This was the wrong tradeoff. We reverted this change on April 7 after users told us they'd prefer to default to higher intelligence and opt into lower effort for simple tasks. This impacted Sonnet 4.6 and Opus 4.6."

On March 4th we decided to intentionally degrade Claude Code, now that you caught us we decided to reverse it, oops!

[-]

temperature_5@reddit

Interesting that there is no mention of quantization, as that is the most common accusation.

[-]

Expert_Job_1495@reddit

100% agree.

[-]

Hanthunius@reddit

Anthropic's phrasing is so disingenuous. "changed from high to medium", instead of "reduced".

[-]

Whole_Ad206@reddit

Y mira que me gusta opus lo use durante 2 meses este año y genial, pero hay que admitir que ahora al menos los próximos meses son de gpt5.5, si no hubiera competencia nos mearian en la cara, hay que ir saltando de uno a otro.

[-]

Tyler_Zoro@reddit

In each of these they made conscious choices to lower server load at the cost of quality

One of those was a change to defaults that could just as easily impact local models if the framework being used altered its defaults. The other change was a literal software bug that, again, could just as easily impact local models.

[-]

my_name_isnt_clever@reddit

My takeaway: Don't just use local models, also use open source scaffolding. Why would anyone put the models into their own control and then throw all the benefits away by still using Claude Code with this history of fuckery?

[-]

Tyler_Zoro@reddit

Don't just use local models, also use open source scaffolding

Wait, how do you propose using local models without using open source scaffolding? What are you talking about?

You can go the other way around, of course. You can use local tools to talk to remote APIs. But I have no idea how you think you could use a local model with someone's proprietary service.

[-]

my_name_isnt_clever@reddit

Claude Code is proprietary software, built for Claude and only Claude. Plenty of people set up bridges or Anthropic-compatible APIs just so they can use it. Instead they could be using literally anything else.

[-]

dizvyz@reddit

A bug caused this to keep happening every turn for the rest of the session instead of just once

If they are not lying about this nobody should be touching any software coming from these guys. This is worse than amateur coding.

[-]

ttkciar@reddit

On one hand you're right, in that a professional software engineering team will have automated tests which changes must pass before the changes are allowed into production.

The fact that this bug was not caught says to me that either they do not diligently practice testing before deployment, or they do not have good tests. Either way it reflects badly on them.

On the other hand, a depressingly many tech companies fail to clear this bar, so Anthropic probably isn't any worse than many of the other companies whose products you depend upon in your day to day life.

It would be very nice to live in a world where more tech companies follow industry best practices, but the sad matter is that we do not live in that world.

[-]

dizvyz@reddit

The first thing that came to my mind was also tests. This is such an obvious thing to test for and it shouldn't even be difficult.

[-]

buyurgan@reddit

don't believe anything Anthropic says, they act on purpose, be 'transparent' accordingly. it is probably all planned. they did many other things but they don't announce.
compared to 2-3 months ago, I think 20x max quota is dropped like at minimum 2-3x. not even counting about the drop in quality or counting opus 4.7 tokenizer changes.

[-]

Haeppchen2010@reddit

I use them via OpenCode and AWS Bedrock and also experienced phases of reduced quality, as have colleagues, too. In this case it’s likely not client side or sampling parameters. Good that my Qwen at home is always the same….

[-]

GatePorters@reddit

Lmao Anthropic sends its models to Jupiter.

[-]

Areign@reddit

isn't the first one changing UX to medium effort rather than high and users could go back to high if they wanted?

also the last one isn't changing how dumb the model is, its trying to get it to perform better isn't it? like the number one complaint most people have about how claude codes is the unnecessary verbosity of the changes. Also again, people are acting like the model is significantly stupider, being less verbose isn't really the same thing.

the second one is pretty bad, reducing context is certainly in the realm of what people are complaining about, but its a corner case, i dont think thats driving the complaints of widespread quality degredation.

[-]

WithoutReason1729@reddit

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[-]

OnlineParacosm@reddit

So businesses are supposed to fire their staff and then replace them with Claude agents potentially run at a cognitive 50% and you won’t know it until a month and a half later.

That’s one hell of a service level agreement

[-]

s101c@reddit

How to sabotage AI integration with one simple trick

[-]

Kitchen-Year-8434@reddit

Hanlon’s razor. I’m sure it was a mix of well meaning good intention plus self serving need to optimize infra and it had unintended consequences.

Agree though that the true remedy for this is self hosting and/or far greater transparency. If we had obvious release notes with the above changes it’d have been trivial to root cause and revert or remedy with local harness config.

[-]

doodlinghearsay@reddit

Hanlon’s razor.

Most overused heuristic ever.

[-]

Kitchen-Year-8434@reddit

Say more? I find most people are stupid and/or incompetent. A handful are sociopaths. Just a numbers game.

[-]

doodlinghearsay@reddit

Say more?

What more is there to say? It is often used when it doesn't apply.

I find it's often used as a soft excuse, to downgrade deliberate wrongdoing to a mistake. Calling the action stupid is useful, because it is emotionally satisfying for the aggrieved party, but doesn't carry nearly the same kind of consequences for the perpetrator as the correct explanation would.

[-]

my_name_isnt_clever@reddit

The exact opposite happens far more, where people just want to be mad and so they attribute any actions of an entity to malice. It's happening in this exact thread right now.

[-]

doodlinghearsay@reddit

people just want to be mad

Sure.

[-]

buppermint@reddit

Not sure how much of it was well meaning. Even now the the latest Opus in the Claude UI only supports "adaptive" reasoning, which means it rarely reasons at all even for hard science/coding problems. Well, at least I got a good lesson about never using closed weight models for any conversation you want to retain control over.

[-]

Kitchen-Year-8434@reddit

Oh; good point. I only use claude through claude code where I can easily configure that. May be a very different experience in Claude UI.

I run ChatGPT at extended thinking pretty much 100% of the time. I can't imagine being forced into adaptive only; I'd always be saying shit like "think super ultra long and hard about this and show your chain of thought", which is a big UX regression IMO.

[-]

spaceman_@reddit (OP)

I agree that this was likely not intended to lower quality, but:

Lack of transparancy and not disclosing these user affecting changes was a choice
The drop in quality is not unpredictable based on the changes made

[-]

Kitchen-Year-8434@reddit

I strongly agree on both counts. We should have patch notes about changes like that and be able to see and modify those prompts clearly.

But then you end up with the "Chinese labs are stealing our IP REEEEE" problem.

It's the whole "my super secret sauce is clear text in .md files" writ large.

[-]

m-shottie@reddit

* And the gaslighting when said changes directly affected people and they voiced their concerns, and rather that than admitting the changes they _knew_ they made but hoped wouldn't degrade anyones experience, they opted to say it was user error and double down on it.

[-]

micseydel@reddit

the gaslighting when said changes directly affected people and they voiced their concerns, and rather that than admitting the changes they _knew_ they made but hoped wouldn't degrade anyones experience

I'm not disagreeing, just trying to build up my notes with sources... do you have a go-to example (ideally with a link or exact words I can websearch for) that contradict the latest most strongly?

[-]

m-shottie@reddit

Basically loads of it is on X from the anthropic team. Lots of references in this sub too. If I have some time/energy when I'm on my computer next I'll have a look and get some links.

But.. it's pretty easy to find, just look for people complaining over the last 1.5 months to get started.

[-]

challis88ocarina@reddit

...and vibe coding

[-]

Kitchen-Year-8434@reddit

This is true.

[-]

mrdevlar@reddit

Hanlon’s razor.

I guess it depends on whether or not you consider enshitification to be malicious or not.

[-]

Kitchen-Year-8434@reddit

re: the "make it less verbose", there's a real tradeoff between how much noise you have in your context window vs. signaling. All tokens in reasoning are not created equally, and there's a lot of fat and wasted generation in reasoning.

Of course, it's a nondeterministic system so you go changing a system prompt trying to make things more terse and how do you know if you've preserved the performance or accidentally gut reasoning chains? Answer: you don't.

Hence the "local inference k thx" piece. And/or more transparency.

Current netflix bumping prices again w/out adding more value and flooding their platform with "unscripted reality TV": enshittification. Dropping the ball trying to optimize something to save costs and/or keep quality while accelerating outcomes: not enshittification.

Yet.

[-]

eushaun99@reddit

I don't think this is enshittification though, if they willingly post an explanation I'm willing to give them the benefit of the doubt that this was unintended consequences of optimising infrastructure.

[-]

tens919382@reddit

This has nothing to do with weights though. The changes they claim, were all on the claude code harness.

[-]

my_name_isnt_clever@reddit

shhh, don't ruin the circlejerk

[-]

JockY@reddit

Anthropic admits to have made hosted models more stupid

I hate this inflammatory emotion-led headline nonsense. They did not "make the models more stupid" nor did they make an "admission", it's just trying to spin a narrative that never happened.

[-]

dwrz@reddit

If a hosted model has been quantized or in some way had its capabilities reduced, I should get a discount. The price should be per quant.

I am so grateful for what I can do now with llama.cpp and Qwen 3.6 27B.

[-]

my_name_isnt_clever@reddit

I was saying this when reasoning models were new and o1 was hiding them, I don't want to pay for tokens I can't even see. But nobody actually cares and so nothing changes.

[-]

margielafarts@reddit

Qwen cannot compare to opus at all

[-]

spaceman_@reddit (OP)

As far as I can tell, they didn't quantize the model but "optimized" other settings, such as reasoning level, system prompt and cache eviction timeout.

[-]

Finanzamt_Endgegner@reddit

I have a feeling that when people complain about lobotomy it's not just the weights getting quantized but kv cache and that to like Q4 and a good model gets shizo lol

[-]

Makers7886@reddit

For sure optimizations are being rolled out onto apis. You can just look at the day 0 vLLM/SGLang configs - hoppers + optimizations. Imo 8bit kv cache was the norm over the last year and I imagine they will throw any optimization they think they can get away with.

[-]

rz2000@reddit

In March I cancelled my Claude subscription after getting moronic replies for a couple days. I thought they were serving a highly quantized model, not just reducing the thinking stage.

Hopefully, I contributed to them changing course, but I don't think I'll re-subscribe. I can continue to try it through the Kagi Assistant, and use local models or Gemini for everything else.

[-]

xjE4644Eyc@reddit

Local models, yes, open weight no. The open weight providers do the same shit all the time.

The only way to ensure that you have a consistent model is host it yourself, not others, including API.

[-]

EconomySerious@reddit

Since they Quality was not maintained why i don't SEE people asking for refunda of the tokens waysted on that periods and if course monetary compensation

[-]

AcePilot01@reddit

And didn't they JUST say they DIDN'T nerf them? lmfaooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

This laugh is more at you all for not cacneling subs more than at them tbh, they do it because they know not one of you simps have a spine at all, just pure skins, no bones, none.

[-]

SysPsych@reddit

Eventually I have to expect the result will be regulation for APIs. If you advertise a particular service, people must get what they are paying for. Something more concrete than "Trust us".

Otherwise I'm awaiting the eventual scandal where someone sells access to their secret sauce API which under the hood is just Claude/Codex, and once enough people sign up at the cheap rate, they rewire everything to a 2B parameter nonsense cloud model and pocket the money from the people who subscribed and don't check their monthly charges often.

[-]

Middle_Bullfrog_6173@reddit

Technically these are all Claude Code bugs and the model and api was unaffected.

You avoided all of these if you used an open harness with Opus/Sonnet. And were hit by them if you used Claude Code with a local model.

[-]

Evening_Ad6637@reddit

I am an API only user and I have to disagree. Opus-4.7 became pretty brain-damaged at some point. I reported my observation here on Reddit that qwen-3.6-35B-gguf produced much better code than Opus-4.7, which was very surprising to me.

It was not the case for Opus-4.6, but with 4.7 there definitely was something wrong at server side. I’ve tested with different harnesses and even even with a non-agentic client with just a simple chat, no system-prompt, tools etc. The produced code was horrible.

I think that was like four/five days ago

[-]

NandaVegg@reddit

What was wrong with Opus-4.7 in your use case? It is generally ranked worse than 4.6 on AB test/vibe-code arena type benchmark so it seems a regression in some areas.

https://www.designarena.ai/leaderboard

[-]

ilintar@reddit

"Oops, we removed interleaved reasoning history, it took us 2 weeks to realize" is actually pretty funny :)

[-]

portmanteaudition@reddit

You clearly did not read the full post. The major reason for performance seemingly declining was the change to default settings which could always be changed (for free) to produce "less stupid" results. This saved people who used defaults money, improved latency, and reduced use of models that were overkill to eat up model limits.

[-]

Commercial-Chest-992@reddit

Yeah, we'll make models stupid on our own, thanks.

[-]

FormerKarmaKing@reddit

https://www.reddit.com/r/ClaudeCode/s/OVChfgtTKr

Not OP. But this post is the best quantified data I’ve seen so far on how bad it got.

Personally, I don’t run local… yet. But effectively losing a week of effective work because my $200 / month vendor decided to short me for their benefit will not be forgotten.

[-]

micseydel@reddit

Thanks for the link, I thought that post was interesting but I thought this comment was more interesting https://www.reddit.com/r/ClaudeCode/comments/1snhyck/comment/ogmt8z1/

[-]

JacketHistorical2321@reddit

The user can change the effort level you know? In terms of that they're just mentioning that they changed what Claude code defaults to.

[-]

R_Duncan@reddit

These kind of tests shouldn't be done in production, not when you're selling a service, not from a reputable company.

[-]

gthing@reddit

Anthropic agrees with you and said they were going to be doing that in the future.

[-]

SeekingTheTruth@reddit

Maybe these guys should do a bit of A/B testing of their changes?

[-]

mrdevlar@reddit

People aren't stupid, they recognise what the tech industry did to all of its offerings after consolidation, we're living in it.

We require open weights as it guarantees that they cannot make the service worse, because the they always have an open competitor who will not.

[-]

gthing@reddit

Misleading title and post. Did nobody in the comments actually read the article?

We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected.

They broke Claude Code - the model harness. And they did it unintentionally. It has nothing to do with the hosted models being degraded. This is either intentional FUD or OP and commenters here do not understand the difference, and I'm not sure which is worse.

[-]

kmp11@reddit

this is a problem when an enterprise client tries to develop a production tool around an engine that varies in quality.

[-]

realmosai@reddit

I was working hard on a project last month and Opus + Local AI was very good, up until one day it suddenly started to hallucinate and utterly ruined the entire feature branch with unnecessary edits, and unsanctioned changes. I was very surprised it looked like the work of a 9B param model, I even checked my local setup to see if I had accidentally switched to a 9B qwen or something. Then the next three days were a nightmare - Opus would pretend to understand everything but hallucinate and forget instructions in the very next message. Took me five more days to fix the damage on a half-build feature. I unsubbed for a month.

[-]

mantafloppy@reddit

They introduces bug and changed default setting to lower server load to improved service to all their users.

They never intended to lower quality like you imply.

On each start you see the Effort level that is chosen, its not hidden at all.

A bug is a bug, their no evil intent behind it.

You are a bit delusional.

[-]

Impressive-Sir9633@reddit

I am just baffled that people don't see the downsides of hosted models. I am a paying Claude customer and like their models for specific use cases. But when it comes to private stuff, regular mundane stuff etc, I don't want or need their hosted models.

For most privacy focused use cases e.g. phone dictation, tax document parsing etc, I use local models. Apple's Foundational models are excellent for basic use case.

https://apps.apple.com/us/app/dictawiz-voice-to-text/id6759256382

[-]

Evening_Ad6637@reddit

Is this opensource or why are posting the link?

[-]

Impressive-Sir9633@reddit

If you are one of the llama.cpp developer, I really appreciate what you are doing for privacy! Thank you!

I posted the links so people can see how easy it is to run local models larger models like Qwen3 TTS. Most people don't realize the local capabilities until they try it themselves. You don't need GPUs etc.

On the website, you can even run voice cloning, TTS etc locally using webGPU within your browser. https://freevoicereader.com

[-]

TheMericanIdiot@reddit

I don’t think Anthropic knows what they’re doing…. They have raw power that’s saving them. They’re not surgical and still they are in the net negative.

[-]

tspwd@reddit

I’m a huge Claude Code fan, paying Max Plan subscriber, but recently Anthropic is doing everything in their power to push developers away.

[-]

sine120@reddit

To be fair to Anthropic, this is the first time I've seen them own one of their fuck ups. They've had like 5 in the past month, so that's 20%, but that's a 20% improvement.

[-]

ComplexJellyfish8658@reddit

All of those are client side changes in Claude code. The title should be updated to reflect.

[-]

ai_without_borders@reddit

the admission covers the reasoning effort flag (thinking token budget) but the production inference stack has multiple quality-affecting layers beyond weights: kv cache eviction policies, kv cache quantization, batching strategies. the visible change was reasoning effort but kv cache quantization is real and harder to detect — at q4 on long-context requests it degrades multi-step reasoning subtly. thats the actual argument for local: not just weights are unaltered but visibility into the full inference stack. you can see and tune every parameter. with hosted you are guessing at which optimizations are currently active.

[-]

jdbow75@reddit

I agree that open-weight models are more reliable, consistent, and that Anthropic changed Claude Code (not Claude models). Unsure why "made hosted models more stupid" is in the title of this post, though? Maybe that is a thing, but we don't know, because hosted models are a black box to us.

[-]

GuardSeparate2727@reddit

Claude is the dumbest most useless piece of software I have ever fucking used in my life. It will overactively refuse to do anything and all the stories about making programs with it are just astroturfed lies.

You can outright tell it how you made your program and it will just go on a schizophrenic rant and talk about xenu's master plan for no reason. You can also outright tell it factual information and it will just keep hallucinating on demented levels that make reddit bots look like intellectual scholars by comparison. Im convinced it will tell US highcom to blow up the pentagon at some point. Its just that dumb and dangerous.

[-]

jeekp@reddit

and this is just what they're willing to admit

[-]

ComeFromTheWater@reddit

Opus 4.7 is the genius kid who’s lazy as fuck. Sonnet 4.6 is the type A go-getter premed who has a bit less raw intelligence but infinitely more capable because it actually tries.

Seriously if I didn’t know better I’d say Opus 4.7 is a pothead

[-]

Quanzitta@reddit

I got to say, the Claude in periplexity is lobotomized

[-]

One_Whole_9927@reddit

...Meanwhile these jackasses are lobbying against open source framing it as a China problem.

[-]

relentlesshack@reddit

The couldn't possibly have a profit motive /s

[-]

LegacyRemaster@reddit

It's called artificial intelligence. Stupid on command, if necessary.

[-]

eli_pizza@reddit

All three of those are Claude Code issues. The model was done.

[-]

gebuswon@reddit

Although some users are able to afford hardware to run these models locally, Users running older hardware like a RX580 are effectively screwed.

Only hope would be models like Bonsai 1b quantized models or hardware prices falling back to reasonable prices.

I for one am patiently waiting for low-spec hardware models to help reduce my costs and reliance on commercial AI

[-]

bidibidibop@reddit

"Admits to have made well-intentioned but in the end damaging changes to their harness (and not their HOSTED MODELS)" just doesn't carry the same weight now does it

[-]

ieatdownvotes4food@reddit

all the companies are optimizing for engagement and token use. anything that's too good where you're just in and out quickly works against them and their numbers.

[-]

Dudensen@reddit

https://x.com/bcherny/status/2041199126076182683

lol

[-]

FormalAd7367@reddit

For those of us who invested in our own set up, we had predicted the frontier models are kicking us out due to their change in business model and it’s happening right before our eyes

[-]

Technical-Earth-3254@reddit

We need a law to publish weights of ai models. Not saying they need a MIT license, but something needs to happen. How are these providers allowed to make changes like rate limits to paying user without further notice or whatever. This seems borderline illegal and is absolutely anti-consumer.

[-]

t4a8945@reddit

I'm so happy to have bought a dual spark setup, no more shenanigans, no more ToS.

[-]

dead-end-master@reddit

Its for selling the pro max mega pack ++ Ultimate premium ass crack for only 5837482$ per months

[-]

g_rich@reddit

I’ve been saying this for awhile now whenever someone complains about the drop in quality of Gemini and Claude. It has nothing to do with a drop in quality of the new models and has everything to do with managing resources.

I don’t think people realize how large these foundation models are and the resources required to run them at scale.

[-]

MomentJolly3535@reddit

I honestly think that they did way worse than that, at one point Sonnet 4.6 was output 50 emojis per messages, everywhere (the chatgpt 4o used to answer).
The intelligence was way below 4.6, something like gpt oss 120B's intelligence.
(for reference it was on a free account, so i dont mind not being a priority for them, but displaying "Claude Sonnet 4.6" while its not, is a huge redflag)

[-]

Inevitable_Raccoon_9@reddit

There are managers and a CEO signing of on this! Such decisions never are done by low ranking people.

[-]

Zeeplankton@reddit

I don't think this is a good example. It's very hard to build a harness, and when you're serving millions of users with something so non-deterministic as an LLM, even a slight shift in a system prompt or bug, a vocal amount of users will notice.

We’ve traced these reports to three separate changes that affected Claude Code, the Claude Agent SDK, and Claude Cowork. The API was not impacted.

I think this is an example of just stupidity not malice. Given how vibe-coded CC is this seems par for the course for the company.

For me personally it's fucking annoying as hell to try to build smart context management in my app, and just tweaking system prompts slightly can really fuck things up.

[-]

mister2d@reddit

While I don't like the guy who called them "Misanthropic", it sure is appropriate.

[-]

Dry_Yam_4597@reddit

I can tell. 4.7 is basically a noob with 10 years of experience and the title "principle engineer". takes the product offline couple of times a day and then gets philosophical.

[-]

xienze@reddit

It's not wrong to be upset at them, but it's kind of funny coming from a sub where people claim absurdly small quants are "just fine, with barely any loss!" as they struggle to make some cutting edge model fit on their 8GB card.

[-]

AnxietyDisastrous613@reddit

Bot?

[-]

spaceman_@reddit (OP)

At least it's a concious choice when you use a quant.

Also, the changes made by Anthropic did not alter the model directly, but the system prompt and default parameters, as well as cache eviction (going from 1h to 5m), so not really a straight apples-to-apples comparison.