Only LocalLLaMa can save us now.
Posted by kaggleqrdl@reddit | LocalLLaMA | View on Reddit | 65 comments
https://github.com/anthropics/claude-code/issues/46829#issuecomment-4233122128
Posted by kaggleqrdl@reddit | LocalLLaMA | View on Reddit | 65 comments
https://github.com/anthropics/claude-code/issues/46829#issuecomment-4233122128
PhillyG17@reddit
This is the same sort of thing that happened in the early "Wild Wild West" of the internet. First it's all about making the best product, then it shifts to making a profitable one. Once these companies have to show they can actually make a profit from these models, they are going to tighten the control over their product and make the customer so reliant that they feel like there is no other option. We've already seen Anthropic nerf their models and limit usage and I don't think this is the last time something like this will happen. The positive thing is that smaller models are getting better so if the sub 70b parameter models could ever get close to the current frontier model performance, local models are going to be much more attractive.
Long_comment_san@reddit
You're not wrong. There will be a models which would be "last megacorp public model ever".
seamonn@reddit
Hopefully that's where new startups step in
Sydorovich@reddit
Don't worry, they will be deemed as extremist danger to society/freedom/democracy/nazism/racism/terrorism/etc. and regulated out of most western markets.
More-Curious816@reddit
Isn't that their plan from the beginning? Only fools would believe otherwise. Give the public drug, actually good drug, for cheap, lets them get hooked to your product, rug pull the shit out of them. Squeeze every penny from them and keep them dependent on your product indefinitely. Literally corporations for max profit 101. In the word of their models. They are not benevolent leaders, they are opportunistic bastards.
Only fools would let sam or dario dictate who can use ai and for what. According to the orange site, sam started inserted ads and dario used persona for digital ID to use some features of Anthropic models. Like, what the actual fuck. Why every CEO need this level of predatory behaviors?
Yusso_17@reddit
i am building a local ai app, maybe if you check it out, you can find out, if that can save us too.
Disposable110@reddit
All the main providers want to move away from seats/subscription plants and either charge usage per token or lock customers into provisioned throughput plans.
There is massive datacenter shortage and they can charge enterprise through the nose as all major corporations are outbidding each other for usage.
The subsidized consumer plans were nothing more than a marketing tool that actually costs them a lot of money.
LA_rent_Aficionado@reddit
I'm not entirely sure about this, a subscription model can be far more profitable if they get the budgets and demand forecasting right - any underutilized capacity is straight profit.
Disposable110@reddit
If you have a limited quantity of compute cycles, then selling 1 compute cycle to enterprise for $10 is preferable than selling 1 compute cycle to an arbitrary number of consumers for $2. It's not even a profit problem, it's an opportunity cost problem.
This is pretty much how NVIDIA just gives the finger to consumer and small business, because 2 customers make up 40% of their revenue, 10 customers make up 85%. While all the consumers and small business in the world does not even account for 5%. 99% of the world is simply not worth NVIDIA's business.
Demand exceeds available compute, therefore the scarce good goes to the highest bidder. If Mukesh Ambani pays x5 market rate for all available compute Anthropic has and books it for the next 3 years, Anthrophic would be stupid not to kick all consumers to free up compute for mr Ambani.
Begun the token wars have.
Hold on to your 3090 workhorses!
LA_rent_Aficionado@reddit
Great points, what I was alluding to with my forecasting comment - ensuring demand and supply align and part of that is whether the infrastructure is static and can scale. If compute is inadequate for all the demand then absolutely, they’ll have to consider opportunity cost.
That said we’re making some good progress on the efficiency front so hopefully developers will continue to pour resources into being able to scale throughput within existing hardware.
Disposable110@reddit
Absolutely, I feel frontier models are quite inefficient. A trillion parameter model may be a step change (duh) but I feel that there's still so much optimization to be done that Mythos equivalent models will eventually get compressed down to 10% or less of their size.
It wouldn't be the first time, if you ever had the displeasure of trying to run ancient massive models like Wu Dao locally, you'd know how ridiculously unoptimized that sort of stuff used to be and how little bang for buck that was.
FullstackSensei@reddit
Who would've thunk that heavily subsidized plans were unsustainable?!!! The tech bros said it'll get 10x cheaper every year!
MasterKoolT@reddit
Who said that? What I've been hearing from Nvidia and the big AI players is a need for massive investment in additional compute capacity, which obviously needs to be justified with an eventual ROI
finevelyn@reddit
"Sam Altman says the cost of using AI will drop by 10 times every year"
https://finance.yahoo.com/news/sam-altman-says-cost-using-041517310.html
MasterKoolT@reddit
That's a misleading headline. Here's the actual quote:
"The cost to use a given level of AI falls about 10x every 12 months."
That's obviously not the same as this year's frontier model being 10x cheaper than last year's. He's just stating that the cost of a unit of compute is dropping by an order of magnitude each year, which probably isn't far off from the truth.
finevelyn@reddit
Slightly misleading, but not really. It's not true what he said anyway. Nothing is lower price than it was a year ago. Same level of hardware is much more expensive, and there's not a cheaper tier of subscription available from any provider.
FullstackSensei@reddit
Hoards of people on this sub who drink the tech bro's coolaid.
Everytime there is talk about the economics of local inference, you'll see them come out to defend how you can't beat $20/month Claude code plans...
I got downvoted so many times for saying I know people who pay the 200/month max plan and don't make it until Wednesday before they hit their weekly quota.
crantob@reddit
"Hordes" btw, if you want to appear smarter than others.
rpkarma@reddit
On the other hand, I hoard local model weights
Firm-Fix-5946@reddit
also bros not bro's
LA_rent_Aficionado@reddit
Exactly, basic economics.
Conversion pricing and long term pricing are well-established practices in business. Where are the long diatribes about internet or cell phone providers offering 50% cheaper first-year pricing for their services. There's so much entitlement on this sub.
MasterKoolT@reddit
There's also plenty of competition (at least four major US players plus all the Chinese labs ripping off their work) so prices are being pushed as low as they realistically can be. It's not a crime to build toward a sustainable business model that earns a profit
LA_rent_Aficionado@reddit
Totally and if people want to continue to use these frontier models and hopefully maintain the access to open models they should recognize that a shift to actual revenue and profitability will help make these technologies and advancements sustainable. Compute, research, etc. expenses are astronomical, that level of capital investment often requires a ROI - well beyond the capacity of non-profit investors.
FORLLM@reddit
The same level of intelligence has gotten massively cheaper every year, but of course few want last year's intelligence.
To your point, it was always a predatory pricing model. I'm not mad about it, it was inevitable. I've tried to make the most of it. Hope everyone has too. We'll have better local tools available if we all use the subsidy before it's gone.
FullstackSensei@reddit
Thing is, most of those who thought it'd be like Netflix missed on the cheap hardware bandwagon.
I never used paid plans because I didn't want to get complacent, and because I knew I could never used those cloud models for work. So, I loaded up on hardware while things were still cheap. It's paying dividends now.
Orlandocollins@reddit
0 regrets buying my 2 rtx pro 6000s. Though I am worried that companies are going to stop doing open models. Something feels off about this last wave of models and how they were released and the tone of thr companies in regards to what comes next.
rpkarma@reddit
The Chinese companies are stopping it. They’re also jacking prices and having to turn a profit too, and open models no longer helps them.
a_beautiful_rhind@reddit
I don't regret my hardware either but this is only a stopgap. They can and will stop uploading weights. You got what you got and that's all folks.
Orlandocollins@reddit
Yeah I wonder if the next step is going to have to be the community helping fund training and then they are given access to the weights
a_beautiful_rhind@reddit
That's definitely going on hard mode considering HW required and know how.
soshulmedia@reddit
I think that would solvable with crowdfounding of good model teams. I am certainly willing to spend a bit more than a typical closed AI subscription to get open weights updated every year or every half year or so or so. And if you make it so that the model weights become open eventually but available to the crowd-funders early, I think this could be a viable model.
And I think I am far from the only one willing to pay for "model training" if I get (eventually) open weights from that.
a_beautiful_rhind@reddit
How to even get people to agree on a size? MoE vs dense, vison, etc.
soshulmedia@reddit
First, I think it is still a bit out and might not arrive very soon, we are still getting very nice open weight releases by big corporations for free.
I think the answer, even if it sounds cliche, would be "the market decides" or maybe in more concrete terms, appropriate market research, or maybe even more concrete, something like this:
Say there is a web site to facilitate this crowdfunding. "AiCrowd".
A team who already has a good reputation (e.g. Qwen researchers or folks who already did model releases) could then offer various retrainings / extensions / research for various promised model sizes or maybe a range of hardware parameters (to give them more flexibility). With a minimum funding needed by date X or else everyone gets their money back. If the funding is met, they can use the funds (or even extra funds if available) for training and all funders get (maybe staged) access to the weights along the way and after release. And a set period after that, the weights become public for anyone to use.
RelicDerelict@reddit
I like that idea, architecturally the models are mature and we just need to keep them up to date, I am in for some kind of subscription, rather to giving it to US AI hunter killers.
Orlandocollins@reddit
I know nothing about training (I know weird given my hardware) but I wonder if there is world where like folding at home we can offer up our hardware to train the models to help too
RelicDerelict@reddit
Wow, that's a brilliant idea, I remember running Boinc when this world used to be a better place.
cafedude@reddit
What about groups like Allen Institute for AI? OLMo has been kind'a meh so far, but the future of open models is probably coming from non-profits like that.
Opteron67@reddit
dual 5090
Ok_Technology_5962@reddit
Yea the last launch of models was... Strange.... Seems like they are all hitting a point of usability and companies are considering if they are just kneecapping themselves as 27b is on par with 397b is a small set of instances
asfbrz96@reddit
We have to thank Cursor for basically stealing Kimi K2.5.
Defyz89@reddit
This isn't uniquely Anthropic — it's the unit economics of consumer AI. Power users burn enterprise-tier prices. The question was always when they'd route us down, not if.
The mechanism behind "silent degradation" is almost always the same — A/B testing quantized or distilled variants under the same model name. No user notification. Regressions get discussed in GitHub issues, closed as "expected variance," repeated. OpenAI had the same pattern with GPT-4 through 2024.
crantob@reddit
Isn't competition grand?
What's sad is how few people have been trained to understand how it works.
Perfect-Flounder7856@reddit
And this is the reason why you own the infra not rent. So bought a DGX Spark. When do I upgrade my CTO from 5070ti to 4090 or 5090 or a pro card cuz man he is all about cloud compute...but he also games...
Silver-Champion-4846@reddit
Is the only hope right now to start wildly experimenting on the already-released open models and try modifying architecture + seeking more data + better training algos? Like a bunch of modified llama3 and Nemo and Mistral3.2 and KimiLinear etc experiments?
RelicDerelict@reddit
Yes, can someone smart and capable do a community model?
Pleasant-Shallot-707@reddit
Nemo is full stack open and NVIDIA has no plans to change that
Silver-Champion-4846@reddit
more like multiple someones collaborating together with access to a beefy training clustercluster, with enough money to support it and enough patience and commitment to the ideal of FREEDOM to actually manage it
cafedude@reddit
Isn't that what places like Allen AI are doing?
Silver-Champion-4846@reddit
Maybe... still behind though since they've been focusing on fully sized models before bitnet became more practical.
cafedude@reddit
Constraints can sometimes lead to interesting discoveries. Like the 1-bit models and diffnets, etc. We definitely don't have the resources that the big guns have, but that could be used to our advantage if we start joining forces.
Silver-Champion-4846@reddit
My thoughts exactly. But I haven't actually found a good bitnet model yet.... especially one that supports my usecase of arabic text diacritization
cafedude@reddit
It seems like the future of open AI models is probably going to come from places like Allen AI. Their Olmo models have been underwhelming so far, but in 2 or 3 years they might be one of the few still releasing open models.
pmttyji@reddit
Only recently changed my plan of getting 96GB AMD VRAM(instead of 48GB NVIDIA VRAM) as I want to run more big models. Additionally getting 128GB DDR5 RAM. So currently I can run up to 200-250B models @ Q4 with good context. But I really want to run large models like GLM 5.1, Kimi-K2.5, etc., too. Don't know when.
Hopefully new inventions like algorithms, papers, resources could help on this over time. Also expecting recent things like TurboQuant, DFlash, DTree, also 1-bit version models(Like 1T model in 100-200B size), more optimizations on llama.cpp/ik_llama.cpp brings some boosts instantly soon.
Finally we'll be getting better affordable devices with 1-2TB Unified RAM with 2TB/s bandwidth next year. Also cheaper 96/128GB graphics cards. Affordable LLM Burners too with large 1T models.
HopePupal@reddit
i want all that stuff too, i want a pony, but i think what we're actually going to see is prices thru the roof and exactly none of that hardware
Silver-Champion-4846@reddit
That is the hope, only applicable to the rich indis. The pour indis are hoping for a lot more efficiency for 1.58bit models or equivalently fast&cheap good models.
Thrumpwart@reddit
I saw an interesting comment here the other day about the surprise popularity of OpenClaw completely swamped the subscription frontier services and they are forced to scale back their services to accommodate everyone.
elongated_argonian@reddit
Honestly, the more reasonable thing to do in that case is either ban OpenClaw or give it a hard token limit on the API. Pissing off the rest of your customers is an odd move, to say the least.
HopePupal@reddit
detecting OpenClaw is left as an exercise for the reader
so is disguising OpenClaw queries and traffic patterns to evade that detector
mantafloppy@reddit
Vibe coder discovering Github, and thinking ticket/issue are like forum, and using them to post social media style, is wild.
kaggleqrdl@reddit (OP)
Nah, the issue was closed immediately and has largely become a forum for venting.
mantafloppy@reddit
Still not the first time i see that kind of behaviour, and its more and more common.
floconildo@reddit
To be honest, the writing was on the wall for quite some time already.
Every major provider have been reporting losses (officially or not) on per-usage basis for the past few years, and there's no clear solution yet to make it sustainable for non-corporate consumers.
Some are trying to subsidize via ads + military, others via companies, but once thing is certain: $10/month for Copilot is DEFINITELY not sustainable with the current technology.
Only good thing to come out of it is R&D on efficiency to cut OPEX, the ones of which can directly waterfall to end users. More with less is the only sustainable way forward if they want to keep on that path.
RedParaglider@reddit
You know who's going to make an absolute killing at ai and military use. Ukraine and their mountain of telemetry.
LA_rent_Aficionado@reddit
This is an interesting take but draws a few inferences and overlooks some nuance. As for profit companies, the only logical arguments (in the interest of the shareholder ) for the Anthropics of the world taking away access for consumer markets would hinge on the following assuptions:
1) B2B + B2C Demand > Capacity and,
2) B2B Profit > B2C Profit
Then logically a company is going to shed its least profitable business if it can meet the demand of its most profitable. But if AI companies can scale with demand and make their existing infrastructure more efficient, I don't see a logical argument for not continuing to provide a service to a consumer market as long as it is profitable and generates value for shareholders.
Regarding rug pulling in terms of price, this is only to be expected. As companies shift from revenue to margin growth priorities, it's only logical the pricing will evolve from "foot in the door" customer conversion pricing (often subsidized) to higher prices that guarantee adequate profit for shareholders. This is a sound and rational business decision, at which time, customers will have the opportunity to vote with their wallets.
Specter_Origin@reddit
In all honesty, I feel providers all over are struggling with capacity and scaling.