About Kimi K2.6
Posted by Exact_Law_6489@reddit | LocalLLaMA | View on Reddit | 44 comments
Recently, I’ve seen lots of ads for the Kimi K2.6 across various social media platforms, and I’d like to hear from people who have used it.
Is it genuinely that good, or is it just a model with impressive benchmark scores that doesn't perform well in real use?
Ariquitaun@reddit
I'm using it for work and it's really good, easily as good as Opus 4.5. Not quite Opus 4.6 before it was nerfed back in March. Very impressive. Paired with deepseek 4 flash for subagent workers it's the cheapest most effective coding pair I've ever tried.
MoodDelicious3920@reddit
Curious, what do u use subagents for? And how will u deal with context that goes to subagent..
Ariquitaun@reddit
If I'm doing anything substantial, I never ask the agent I communicate with to do it. I use a thinking model on that agent, then ask it to dispatch sub-agents for exploration, fact gathering, and implementation. You ask it to give them a pristine context with as much task-focused information as possible so they can do the job without improvising a solution, and you use cheaper models on those sub-agents. It keeps your main context clean of noise and your costs down.
If you use opencode, you can set different agent roles (agent and sub-agent for build, exploration, planning etc) and a lot of the time they'll be used automatically.
gestapov@reddit
Do you go by usage on like open router or use a sub?
666666thats6sixes@reddit
I accidentally used it instead of Sonnet in regular work in opencode and only noticed it once finished, the $ burn was ⅒ of what I normally had at that point. It's perfectly adequate for SWE work and it has vision, which is nice for website debugging through playwright.
gavff64@reddit
A tool that could auto backup a real project in a temporary environment and then randomly choose a model from a list anonymously. Hm, could be a neat idea.
ansibleloop@reddit
I had Pi build itself a container to run in where the mounted directory is the repo
Just need a backup of the repo and you're sorted
666666thats6sixes@reddit
That's similar to what I use, although I mainly evaluate local models and tooling with it. I had qwen3.6 27b look at the sql benchmark and had it build a similar UX for general testing. Problems are in columns, each line is a new model/parameter set/harness. Each problem is a git repo with a prompt and tests, the suite runs the agent in a bubblewrap jail with the repo mounted via --overlay and collects metrics like tokens, number of tool calls, how many retries until tests pass, wallclock time.
Silver-Champion-4846@reddit
Basically openrouter but with some bells and whistles
IamFondOfHugeBoobies@reddit
I started testing it yesterday and I'm a big fan. It's very smart, very well trained. It obeys my instructions without going off on tangents like GPT or Gemini models will.
With Anthropic shitting the bed more and more I have really high hopes for this model family even if the context is a bit low for my taste right now.
cloudcity@reddit
A daily drive it, I find it to be a smidge worse than sonnet 4.6
apeapebanana@reddit
my current meta is on building out frameworks for webdev:
- qwen3.6 27b for brainstorm, personal assistant
- gemini 3.1 pro for analysis
- kimi k2.6 for building (prior to this minimax m2.7 but its not hitting target)
overall it have a good head and following instruction and could hold it own opinion well when it receive some conflicting information, cheap too!
now i'm trying out deepseek v4 flash, which seem to be driving really smoothly
Academic-Novice@reddit
Only tried it once, I gave 3 models the same mixed front-end + backend plan to implement and then compared its result to GLM5.1 and minimax2.7.
In the end it did the worst even though minimax didn't even touch the Frontend stuff \~ just to many mistakes while also being the most expensive and slowest of the 3. (Though speed definitely gets influences by me only using ZDR compliant cloud provider)
gestapov@reddit
So glm 5.1 won?
Academic-Novice@reddit
In my test yes. It was an agentic workflow so i let the model decide when they thought they finished.
In the end GLM had produced the least bugs (and also tried to do both front-end and backend). Though it did also make mistakes like not creating any UI elemts to let a user actually use the front-end features it started implementing.
In favour of minimax is the cost and speed to finish though. So with smaller tasks and the right/prompting supervision it should also do well while costing only half as much as glm and being faster.
quickreactor@reddit
From my experience, it really is that good
00Dazzle@reddit
It's genuinely very good
natermer@reddit
In my experience Kimi K2.6 isn't quite on par with Opus 4.7, but it is close.
It is is sometimes better, sometimes worse. I think that Opus has a edge, but it isn't anything like night and day difference.
MyHobbyIsMagnets@reddit
I’m spoiled by Codex now. It’s about as good as Claude Opus was 1 year ago, but it will need to be watched much more than Codex 5.5
KURD_1_STAN@reddit
Damn, so it is beating opus 4.7 then.
MyHobbyIsMagnets@reddit
I wouldn’t say it is. Codex > Opus > Kimi
KURD_1_STAN@reddit
It was a joke about anthropic's changes in its models
aalluubbaa@reddit
I can only speak of my experience. I only used it for Hermes as the main agent. I’d ask it to say create a real time tts pipeline using my pc hardware so it’s as close as gpt4o, it failed miserably despite I tried to reprompt it multiple times and burnt thru millions of tokens.
blargh4@reddit
Personally I found it fairly mediocre for real-world usage compared to daily-driver Sonnet 4.6 at work. Of course, this was with a different harness so not an apples-to-apples comparison.
Iory1998@reddit
Deepseek-v4-pro is my daily driver. Before, it was Gemini-3.1 pro, but not anymore.
spudlyo@reddit
I daily drive gemini-3.1-pro-preview-customtools on high, but got really solid results from Deepseek-v4-pro all day yesterday, and at a fraction of the price. Slower, more thoughtful, but ultimately it feels almost as capable. To me it seems more detail oriented, which was very useful in a long planning session, where it meticulously kept track of our list of items and issues to address along with their resolution. Gemini-3.1 is like "bro, just let me go, I'm done talking about it."
Iory1998@reddit
And it has a very good attention span. My conversation with it span over 700K and it remains coherent and remembers details from early chat turns.
Iory1998@reddit
Kimi-2.6 is too expensive to be honest. At that price tag, you better have Gemini.
Hoak-em@reddit
Kimi fits in the places that GLM-5.1 doesn't (frontend design), though for everything else I still use GLM-5.1
I use forgecode as a harness for glm-5.1, but opencode might be a better harness for Kimi atm, given that I find it misbehaving in forge and opencode has prompt and tool optimizations for Kimi
For stuff other than agentic coding, Kimi is probably the best all-around.
Both are difficult to run locally unless you have a helluva homelab, and if accessing via cloud api, Kimi is the most reliable since it's native int4, thus the model you get is the same as the one in the benchmarks
gestapov@reddit
Does any kimi provider offer a sub?
Hoak-em@reddit
Kimi coding plan or upcoming fire pass v2 from fireworksai, but v2 is locked to v1 early access users, so we won't know if it'll be offered until news about v3 comes out
Opencode go plan also seems like a good option, but I haven't tried it yet
Hoak-em@reddit
Also if you have the resources for local, native int4 helps a lot with hosting -- you can use AMX on xeons and 3090s or arc pro b70 -- kt-kernel is a good option for this model
Technical-Earth-3254@reddit
I can't pin down the best oss model, but K2.6 is one of the top 4 oss models (all tied 1st bc all are great for me).
mimrock@reddit
I had better luck with Mimo-V2.5-Pro, but that's only a sample of one task (python, via opencode) and via openrouter, not locally.
Uriziel01@reddit
From my experience it's very good, ignoring the benchmarks I've preferred it over GLM5.1, Minimax M2.7 and MiMo V2.5
jon23d@reddit
I use it almost exclusively, for coding.
LivingHighAndWise@reddit
From my experience, It's about Sonnet level intelligence for agentic coding.
Lissanro@reddit
Yes, Kimi K2.6 is good (I run Q4_X quant with llama.cpp on my PC). For UI and frontend work it is better than GLM 5.1 in my experience. However, for the backend work GLM 5.1 (tested IQ4 quant) did better for me in most cases.
Eyelbee@reddit
It overthinks a little bit and too generous with the tool calls, but I didn't really try it hard
wanielderth@reddit
lol isn’t kimi the goat? Whats this question?
Specter_Origin@reddit
Its pretty good model! their own plans are not very good though : (
Base model is straight up better than GPT 5.4 for real use case and response format etc.
tirprox@reddit
it is good.
Riseing@reddit
Works great, has no guardrails so it can be used on any project with no denials.
Thinks a bit too long at times, and can get confused in larger codebases.
Its my daily driver.
Real_Ebb_7417@reddit
It's good. I'd generally trust the benchmarks with plain models comparison between Kimi/Claude/GPT (so GPT is better obviously), but I really liked working with Kimi when testing their subscription.