Opus 4.7 Max subscriber. Switching to Kimi 2.6
Posted by meaningego@reddit | LocalLLaMA | View on Reddit | 104 comments
I know people just like to throw shit at Anthropic. I'm not one of those. I have nothing against them as a company, and I actually dislike them less than the other big players. I had all my team switch over from Cursor because Opus felt so good. Since the Max plan is never enough, expenses are growing bigger by the day. So when we can we supplement with Qwen 3.6 plus keeping Opus as harness. It's good, but wasn't "as" good. Lots of mistakes and stubs.
The feeling everyone is sharing is Opus 4.7 got suddenly so lazy, on top of expensive. Part of the problem might be in Claude Code CLI itself, who knows.
And so today I switched over to kimi 2.6 and it's.. wow! So fast and pleasurable to use. Context is much smaller but keeping an eye on it it's still pretty reliable. Immediately purchased a yearly subscription and will recommend to my colleagues as well.
At the moment I'm using it with their cli, it feels smoother than it is when plugging it into CC via env vars. I'm just a bit sad it doesn't work out of the box with Forge. I submitted a PR to fix it (https://github.com/tailcallhq/forgecode/pull/3098).
sb5550@reddit
Kimi is just a 1T model, and Opus is 5T, let that sink in
Chinese models are not behind, they might well be ahead
Time-Category4939@reddit
I don't really know much about AI model's architecture, but why would a 1T parameter model be better than a 5T parameter one? I might be conditioned by my lack of knowledge, but isn't this a situation where "the bigger, the better"?
Do you have any good source of information where I can learn a bit about model's architecture and such things as the ones being discussed in this comment?
Guinness@reddit
China is beating the United States these days. It’s sad. But American firms keep refusing to release their models. You would think Linux would’ve shown us the path of prosperity when it comes to tech.
Steus_au@reddit
how would you know it is 5T if they never disclosed that?
yehiaserag@reddit
Can you provide source for this?
mxmumtuna@reddit
It’s “trust me bro”. No way it’s 5T. 🙄
relmny@reddit
Probably the big liar...
nullmove@reddit
If anyone is wondering, this is what Elon Musk had revealed (also apparently Sonnet is 1T).
Dude might be a raging narcissist but he hires Anthropic people so should have better-than-speculation level of knowledge.
relmny@reddit
Why do you think a liar will tell the truth?
Kodix@reddit
Sincerely, with as little bias as I can muster - if Elon 'Most Salient Line of Code' Musk told me the sky was blue, I'd double-check.
Sure, he may well have privileged information. He's in the field. But the man is a bullshitter, through and through.
Oren_Lester@reddit
totally, even bet on the opposite
Kathane37@reddit
As if Elon speech as any value. Do you really think that Anthropic could have a better inference with a 5T than OpenAI ? (Remember how slow 4.5 ?) Do you think that Anthropic 1T (Sonnet) would be barely better than random 1T from Chinese lab ? The maths don’t maths. It is just Elon coping for is ass 0.5T grok model.
nullmove@reddit
If you had any capacity for self-reflection, you would know the only one coping here is you.
Some of these days I really can't decide if you lot are any less degenerate than the Elon fanboys. A mistake on my part to evoke his name I guess. Either way, fuck off to singularity, this is not the sub for you.
Orolol@reddit
And is also known to be a liar.
muyuu@reddit
Not really known for sure, but very likely much bigger than Kimi 2.6, judging by how they're scrambling for compute.
mr_Owner@reddit
US military needs more intelligence, so the folks get less.
thatdude391@reddit
It has been super annoying running into significantly lower usage when the day or so leading up to things happening in the middle east.
Ok-Contest-5856@reddit
Private Equity who dumped billions into anthropic and openai are in for a really bad time in my opinion. These open models are neck and neck and way cheaper.
eli_pizza@reddit
The fundraising is downstream of the real issue, which is that they are all incredibly unprofitable
muyuu@reddit
Yep I agree. Nobody can predict the future for certain, but this is looking really bad for Anthropic specifically and likely for OpenAI as well. They're applying hardcore austerity to users and the Chinese are wrecking their pathway to ever make large margins on their compute.
How are they ever making those hundreds of billions back, I have no idea. My current guess is they won't.
Orolol@reddit
I dont think so. Anthropic business is selling licence to other business. The vast majority of them will accept even a subpar model and degraded performance over using Chinese models. Example : lot of them use copilot even if its a worst service in nearly every aspect
muyuu@reddit
People will only overpay so much. Even Cursor is just distillating Kimi, which is Chinese-laundered enough for American businesses to use.
Orolol@reddit
Sure, but the money stay in US, company can say that they use an US service, etc... See in Europe, where company are happy to use Mistral, despite the fact they arr quite far behind in raw model performance, but can install model on premise free from US and Chinese influence.
muyuu@reddit
This thread is about the 100s of billions (trillions already) that have gone into OpenAI and Anthropic, and that will require a miracle to find justification, with the potential domino effect of bankruptcies this might trigger.
Orolol@reddit
This thread is about Kimi.
muyuu@reddit
this is the thread we're in: https://old.reddit.com/r/LocalLLaMA/comments/1srd2cc/opus_47_max_subscriber_switching_to_kimi_26/ohe1wza/
Orolol@reddit
Exactly.
muyuu@reddit
no mention of Kimi there
Orolol@reddit
Oh, you know you can see previous comments in the comment chain also.
muyuu@reddit
that is the start of the thread
Orolol@reddit
And this comment answer to ... ?
muyuu@reddit
that comment is the start of the thread
it's not directly about the reddit post it replies to
Orolol@reddit
Oh yeah, so when the comment say
It's not replying to the OP and it's not about Kimi.
muyuu@reddit
we were talking about the impact on the feasibility of recovering those investments, perhaps you meant to reply to a different message
Orolol@reddit
In the context of open weights model being on par with Sota closed models. Thats litterally the the subject of the thread, the subject of the parent comment of this discussion, even your own comment is referencing this.
Eyelbee@reddit
These Chinese open models are trained on absurdly lower resources compared to the propietary US models. If Chinese labs had the same resources, they would already have released a mythos tier model, if not better.
positivcheg@reddit
Nah, Chinese models are also trained on outputs of other proprietary models :)
debackerl@reddit
And their were evidence of US models being trained on Chinese models' traces. There's cross pollination
Lilhoho_Patience6120@reddit
like komposor 2 from cursor? hahahaha
positivcheg@reddit
AI inbreeding, let’s go!
RecordingLanky9135@reddit
Chinese models are actually distilled from Claude or Chatgpt, that's why they use less resource and no, they can't be better without those US models.
pydry@reddit
Aint no better hype marketing to build anticipation than "trust us we would release this but it's just too dangerous to release into your filthy hands".
rkoy1234@reddit
I don't doubt for a second that they actually have much better models available privately themselves, for stuff like high importance industries and governmental use.
ain't no way they're releasing their best of the best.
tomz17@reddit
Ish. They are cheaper because they are second (which is an excellent strategy in the current market conditions). If they had to push the envelope while US companies ripped them off, the situation would be flipped.
ebra95@reddit
except people and companies which already established on openai and antrhopic have become comfortable and won't risk switching to kimi or other cheaper options.
but for us the rest this is exceptional news.
I used kimi k2.5 at first and it was very good, also the 100$ sub is 5x what you can solo use. I used it day and night, parallel sessions and never got even to half of it, so the 50$ one is definately worth.
I have not switched, but xiaomi was free and so was qwen 3.6 for a period, but sounds like i'm going back to k
Temporary-Mix8022@reddit
One thing that we have good history on.. is that businesses, no matter the field, someone eventually comes round to cut costs.
When your developers are sat on $60,000 to $150,000 per seat of API costs a year for Anthropic...
And some McKinsey kid goes.. but..if you just rent an entire rack of H100s on someone's server, or even just pay per token on an open model on some reputable SOC2 provider, your costs would literally be 1-5% of what they are now.
Are your engineers really 20 to 100x more productive using Opus?
No. Honestly.. I think that Opus 4.6 and 4.7 have shown us that we are hitting a plateau. This Mythos thing is just marketing - I could go out tomorrow and find 5 vulnerabilities in FFMPEG, and I guarantee you that whole they might exist, they are so niche they'd never be exploited, or aren't even in the current release package..
IrisColt@reddit
Please find just 3, heh
Temporary-Mix8022@reddit
Yeah, you got me ffmpeg is obviously a tough example, arguably one of the most used OS libraries in the entire world, and potentially on some metrics, the most used.
But I'd take the example of the "lucasarts games video files" exploit that was found by AI a year ago. The codec isn't even in use anymore, isn't in the live distro.
Anthropic are masters of this kind of marketing, they say some stuff, and they get the media to whip up the storm for them. They couldn't have bought the kind of advertising that followed - media outlets like the FT, BBC, NYT, WSJ - they picked it up.. and who reads them? Politicians, businessmen etc..
and so suddenly.. your name is propelled in front of the most powerful people in the world, and if we know anything about politicians.. they cannot resist creating a headline of their own, so then they all started saying stuff about being scared of Mythos.
and all of that? It is around potentially a 25yo piece of code written by a hobbyist for an open source project on code that isn't even shipped.
Dabalam@reddit
I think this gets to an economic point people don't discuss much here:. competition in this space means that frontier companies will (most likely) have to reduce prices or fail.
People keep talking about how expensive AI products are in reference to the frontier models. The view is that prices will necessarily rise due to costs. Raising your prices is only possible if you have a near monopoly, or customers believe there is a significant difference in quality between products. People will simply shift to a cheaper model if there isn't a massive delta in performance, that's how free markets work.
People dooming about future AI costs need to ask themselves if they think open weight models are actually being made more cheaply than Claude, ChatGPT, Gemini.
If they are, then models like Kimi should out compete these products in the long term and AI prices should fall, even if that means the current frontier companies fail (assuming markets remain free). If everyone is lying about costs then yes prices might go up in the future across the board.
Current trends to me signal that Google and ChatGPT have recognised to some extent being uncompetitive in the space of open weights models might be harmful to business in the long term, although they haven't shown they believe frontier models are under threat from this kind of competition (yet).
Civilanimal@reddit
US providers can't compete with the Chinese pricing, and that's the point. China can't compete on development, but they can compete on serving the models. This is why they're so much cheaper or release as open source. China is trying to cut US providers' knees out from under them. The idea is to commoditize the models, and shift the market into serving them rather than developing them.
If US providers were to lower their prices to where the Chinese models are, they WOULD ABSOLUTELY go bankrupt.
meaningego@reddit (OP)
I don't think so, I think most of the community of developers is just like you and me, unlike a standard chatgpt user we based our tool choice on rational judgment because we can immediately see which model is doing a better job for us.
ebra95@reddit
yes but enterprises pay better and standard users are cheaper to provide to.
Turbulent_Pin7635@reddit
Kimi k2.6 is better than the best model from February. The distance between paid and open weight now are as short as 2 months. Kimi is already better than Opus 2.6, what is amazing!
jeremyckahn@reddit
I seriously doubt it. Partnerships, perception, and marketing matter WAY more than actual utility when it comes to commercial success. No Chinese models come close to Anthropic in that regard, at least in the US.
Novel-Dimension-9918@reddit
How to set this up with Kimi to be effective ? With open code or ? How would you go about it ?
namakoo1@reddit
kimi 2.6 better than Claude opus 4.7?
meaningego@reddit (OP)
No, definitely not, but usable if you use Claude as harness or have well scoped tasks. Also good with frontend.
namakoo1@reddit
Thanks for the info! That’s really helpful to know. I’m quite interested in Kimi now, especially since you mentioned it works well for agentic workflows and frontend. I’m actually working on an agentic pipeline myself, so I'll definitely check it out. Appreciate the tip! 🙏
FPham@reddit
On MAX you get 1 million token Opus. It really shines with bigger projects like no other can.
meaningego@reddit (OP)
Even mixing models, I finish my weekly tokens in one day/one day and a half. I alone at my company spent 2k extra just in the last 2 weeks of the past billing cycle. That's a measure of love, isn't it?
But with the cache fiasco I sorta expected them to come forward and refund some of those, and so I realized that my love was being a little too unidirectional. Exploring is always good. I'm afraid the big companies and their investors still didn't fully understand developers do not behave like regular consumer. We pretty much figure out the various models strengths and weaknesses. We're pragmatic. AI is a tool, a commodity. They can't really sit on their hands and pray we stay for the vibe.
FPham@reddit
But how? Hitting weekly is physically impossible for me, unless I decide never to sleep. Are you using it for some agentic stuff that sends half the internet to it at each turn?
DramaLlamaDad@reddit
I just use it normally, usually one task in planning mode, one task in implementing mode at all times. I never run long sessions and I'm still sitting here looking at 4% remaining for the week with 25 hours until reset. Turned the thinking down a notch from default, to High but still no bueno.
A lot of it depends on your code base size. If you're working on some small greenfield project with 50k lines, you're probably fine. Most of the stuff I hit is half a million lines or larger.
RemarkableGuidance44@reddit
You're not working hard enough... The difference from Jan usage vs now is huge for my team. We switched.
meaningego@reddit (OP)
I have 6 terminals open at all times, working either on different angles or on different projects altogether. I also made a tool to work overnight for me so I could use all 5 hours windows (https://www.npmjs.com/package/claude-overnight). I connected some tools we use internally to a pipeline with auto-bug fix when people report bugs via telegram. I have scrapers that need auto-healing features, writing the new login procedures themselves. I shared skills to my colleagues to help Claude design more compatible pptx without wasting context (https://www.npmjs.com/package/quicklook-pptx-renderer). A system to save tens of hours for the accounting department. And so on. Suddenly I can bring forward so much work, I just can't stop. And yeah, I'm not sleeping either.
Arrival-Of-The-Birds@reddit
I got sick of refreshing the Claude usage page every 15 minutes. Had to leave. Went to codex but Kimi looks great too.
gorgono95@reddit
You mean it used to shine ... two months ago. Now is complete garbage, introduces more problems than it solves and eats the 5h limit like nothing. (used to have 5x btw.)
So, I don't know, unless they released some fix since it released ... it is a dumpsterfire.
DramaLlamaDad@reddit
I'm old enough to remember when we had to survive on 200k context... which means I'm at least 3 months old! Seriously, the same people trying to use the full 1 million context on Opus are the same ones confused why it is slower, taking longer, and hallucinating more. Just because it can go beyond 200k context doesn't mean you should. 256k context is plenty for any task. If you're just jumping right in and telling it to read the whole codebase on every task, you're doing it wrong. Spend a session doing a research task to have it collect the code, APIs, and information it actually needs, THEN have it do the plan with that, and then have it implement it. One of the code bases I work on is roughly 2mln lines and wouldn't fit in 1mln context and yet, somehow I survive.
korino11@reddit
You justn need a system with Orchestration. Orchestrator know the Concept and project files. His jobs inly to spawn subagents to give a tasks. With such logic 256k context enough
nuclearbananana@reddit
Imma be honest, I have max but I've never gone over 120K context. I have no idea how people use that much and I do not trust any model past about 100K, not opus, not gemini not gpt, they just get so incredibly dumb
willi_w0nk4@reddit
Well you don’t need to paste everything in context to provide the agent with enough information… this is the definition of bad tool selection and usage coupled with bad prompt engineering.. this is the way you get an really expensive workflow
VonDenBerg@reddit
I’m curious about chinas initiative to push open source models and their financial gain that could be had if they closed it.
sweetbeard@reddit
I want open source to win as much as the next guy, but people comparing Kiki 2.6 to Opus are delusional. The damn thing just. keeps. thinking.
meaningego@reddit (OP)
Yeah; can feel that way. I made some rules to tell it not to overthink stuff
icecrown_glacier_htm@reddit
Where do you use the Kimi models? Via moonshot.ai directly or elsewhere? What kind of limits are enforced?
meaningego@reddit (OP)
I'm on Kimi code. Seems pretty good so far.
HynDuf@reddit
Hi, is the $20 plan have ample usage? how does it compare to $20 claude code?
meaningego@reddit (OP)
I’m not sure, I did the bigger, purely based on “feeling” it feels two or three times more. Would be even more but I have to say sometimes you have to do two or three iterations whereas opus xhigh will close in one.
seunosewa@reddit
They also have a VS Code Extension with lovely typography.
Zemanyak@reddit
Do we have any token comparison for similar plans/price ? Specs are just opaque as fuck. 19$ Kimi plan says :
Do I have a lot or a little ? More usage than Codex/Claude or less ?
hellomistershifty@reddit
Well, if API prices are anything to go by:
meaningego@reddit (OP)
I don't know but they look plenty compared to Max plan
Worried_Drama151@reddit
You are dumb lol… 😂 https://x.com/bridgemindai/status/2046313533743468993/video/1?s=46 kimi models are trash
redbike@reddit
Good luck with that.
Pablo_Offline_AI@reddit
Not JUST lazy, it feels like it's MEANT to eat up usage. I turned off my $200 a month subscribtion for a few weeks and ONE prompt/reply hit my timeout for usage. try again in 2 hours BS
Thepandashirt@reddit
Be careful using kimi 2.6 if you have anything proprietary you don’t want used in training. There’s no opt out for using your data for future training.
bithatchling@reddit
Honestly, Kimi 2.6 has been surprisingly solid for large context windows where Opus usually starts getting lazy or hitting limits. I still think Claude has a slight edge in logic nuance, but for raw dev utility, Kimi is making a very strong case.
xadiant@reddit
Do people really use 1M context? We just need better agent CLIs. I don't think there are many use cases where you have to even use 500k tokens in context. That's like 4-5 books worth of tokens.
iVtechboyinpa@reddit
If I’m debugging, I use it heavily. I don’t really are about context degradation.
But I always plan & clear, then execute. It really helps with debugging afterwards too, since I can fork the session and hammer out bugs without needing to worry about handoffs until I’m ready to actually make another plan.
meaningego@reddit (OP)
You never really go to 1M. On average I keep a safe threshold at 40% of the actual context before degradation hits too hard, so I prepare a new session. In reality some models are much better than others at finding the needle in the haystack (actually recalling useful informations from inside the used context) than others.
But for all models, as pressure increases you will always see a steep degradation in quality. So larger context windows add value because generally it means you can continue a multi-turn conversation longer, and still have accurate results, or it can gather much more information about your project before deciding how to intervene.
And when you have that, at the expense of using more tokens, you get much more productive.
almbfsek@reddit
I truly believe the actual difference maker will be tools in the future. Endless inefficiencies in all coding agents from claude code to open code. it's like they are hell bent on adding anything and everything with zero regards on context usage.
RealDedication@reddit
In-Context learning. If you have a model very well adapted to your needs, every compaction is unpleasant. :)
debackerl@reddit
Thx for sharing your experience! I'm waiting for Fireworks' Fire pass to upgrade to K2.6 as well! It's always super snappy. I'll keep my local Qwen3.6 35B A3B as a backup
_derpiii_@reddit
appreciate post like these, really puts things into perspective. Thank you for sharing :)
Steus_au@reddit
next to you but will miss Opus halutinatios, they so great...
singh_taranjeet@reddit
The cost delta is brutal when you're running a team on Max. We've been testing memory layers to cut down on context bloat across sessions-helps a ton when models start getting lazy or you're hitting token limits constantly. Are you tracking per-task token usage with Kimi vs what Opus was burning through?
meaningego@reddit (OP)
I'm not tracking that precisely, but I feel I would have blown my 5 hours window multiple times with so many sessions running at the same time. But now I'm spreading, since I have a bunch of subscriptions I'm also testing the models with the same prompts to pick my favorite and as you'd imagine there isn't a single best in everything.
philo-foxy@reddit
How does z.ai, glm 5.1 compare? They too were going on about its capability in long horizon tasks, and a few folk said here they liked it for coding
eposnix@reddit
I feel like I see these posts literally every Kimi release then no one talks about them again.
meaningego@reddit (OP)
I always come to reddit to read opinions about stuff and is the only place I trust -collectively- to learn from other people's experience on AI models, so I figured this time could be useful to share my own experience to save someone else a buck and a few tokens. Then of course, it's going to be surpassed by another post next week. But what else can we do? We're living on the edge.
EstarriolOfTheEast@reddit
With each new release the frontier is reset and so the utility of a post like this one increases again, as some open model catches up. However, there is no need to keep making posts like this during quiet periods (with respect to frontier performance; so far only held by closed models).
Worried-Squirrel2023@reddit
the laziness on 4.7 is the killer for me. it's not even quality, it's that it stops mid-task or wraps things up before they're actually done. running into a smaller context window with kimi is way less painful than fighting with a frontier model that won't commit. how are you finding the tool calling reliability? that's where I've seen the biggest gap between kimi and opus in real workflows.
meaningego@reddit (OP)
I'm experimenting. I feel in some runs it's working best with their own CLI. I've also seen it pollute its own chain of thought with the wrong data. But at least you can read it as it thinks and intervene.
jmememes@reddit
i’m with you Opus 4.7 is just too expensive and the difference vs Kimi 2.6 I do not feel it .
isthisit0923@reddit
my Max plan is expiring in 9 days, thinking about switching too.