About Kimi K2.6 | TheaterFire

[-]

Ariquitaun@reddit

I'm using it for work and it's really good, easily as good as Opus 4.5. Not quite Opus 4.6 before it was nerfed back in March. Very impressive. Paired with deepseek 4 flash for subagent workers it's the cheapest most effective coding pair I've ever tried.

[-]

MoodDelicious3920@reddit

Curious, what do u use subagents for? And how will u deal with context that goes to subagent..

[-]

Ariquitaun@reddit

If I'm doing anything substantial, I never ask the agent I communicate with to do it. I use a thinking model on that agent, then ask it to dispatch sub-agents for exploration, fact gathering, and implementation. You ask it to give them a pristine context with as much task-focused information as possible so they can do the job without improvising a solution, and you use cheaper models on those sub-agents. It keeps your main context clean of noise and your costs down.

If you use opencode, you can set different agent roles (agent and sub-agent for build, exploration, planning etc) and a lot of the time they'll be used automatically.

[-]

gestapov@reddit

Do you go by usage on like open router or use a sub?

[-]

666666thats6sixes@reddit

I accidentally used it instead of Sonnet in regular work in opencode and only noticed it once finished, the $ burn was ⅒ of what I normally had at that point. It's perfectly adequate for SWE work and it has vision, which is nice for website debugging through playwright.

[-]

gavff64@reddit

A tool that could auto backup a real project in a temporary environment and then randomly choose a model from a list anonymously. Hm, could be a neat idea.

[-]

ansibleloop@reddit

I had Pi build itself a container to run in where the mounted directory is the repo

Just need a backup of the repo and you're sorted

[-]

666666thats6sixes@reddit

That's similar to what I use, although I mainly evaluate local models and tooling with it. I had qwen3.6 27b look at the sql benchmark and had it build a similar UX for general testing. Problems are in columns, each line is a new model/parameter set/harness. Each problem is a git repo with a prompt and tests, the suite runs the agent in a bubblewrap jail with the repo mounted via --overlay and collects metrics like tokens, number of tool calls, how many retries until tests pass, wallclock time.

[-]

Silver-Champion-4846@reddit

Basically openrouter but with some bells and whistles

[-]

IamFondOfHugeBoobies@reddit

I started testing it yesterday and I'm a big fan. It's very smart, very well trained. It obeys my instructions without going off on tangents like GPT or Gemini models will.

With Anthropic shitting the bed more and more I have really high hopes for this model family even if the context is a bit low for my taste right now.

[-]

cloudcity@reddit

A daily drive it, I find it to be a smidge worse than sonnet 4.6

[-]

apeapebanana@reddit

my current meta is on building out frameworks for webdev:
- qwen3.6 27b for brainstorm, personal assistant
- gemini 3.1 pro for analysis
- kimi k2.6 for building (prior to this minimax m2.7 but its not hitting target)

overall it have a good head and following instruction and could hold it own opinion well when it receive some conflicting information, cheap too!

now i'm trying out deepseek v4 flash, which seem to be driving really smoothly

[-]

Academic-Novice@reddit

Only tried it once, I gave 3 models the same mixed front-end + backend plan to implement and then compared its result to GLM5.1 and minimax2.7.
In the end it did the worst even though minimax didn't even touch the Frontend stuff \~ just to many mistakes while also being the most expensive and slowest of the 3. (Though speed definitely gets influences by me only using ZDR compliant cloud provider)

[-]

gestapov@reddit

So glm 5.1 won?

[-]

Academic-Novice@reddit

In my test yes. It was an agentic workflow so i let the model decide when they thought they finished.

In the end GLM had produced the least bugs (and also tried to do both front-end and backend). Though it did also make mistakes like not creating any UI elemts to let a user actually use the front-end features it started implementing.

In favour of minimax is the cost and speed to finish though. So with smaller tasks and the right/prompting supervision it should also do well while costing only half as much as glm and being faster.

[-]

quickreactor@reddit

From my experience, it really is that good

[-]

00Dazzle@reddit

It's genuinely very good

[-]

natermer@reddit

In my experience Kimi K2.6 isn't quite on par with Opus 4.7, but it is close.

It is is sometimes better, sometimes worse. I think that Opus has a edge, but it isn't anything like night and day difference.

[-]

MyHobbyIsMagnets@reddit

I’m spoiled by Codex now. It’s about as good as Claude Opus was 1 year ago, but it will need to be watched much more than Codex 5.5

[-]

KURD_1_STAN@reddit

Damn, so it is beating opus 4.7 then.

[-]

MyHobbyIsMagnets@reddit

I wouldn’t say it is. Codex > Opus > Kimi

[-]

KURD_1_STAN@reddit

It was a joke about anthropic's changes in its models

[-]

aalluubbaa@reddit

I can only speak of my experience. I only used it for Hermes as the main agent. I’d ask it to say create a real time tts pipeline using my pc hardware so it’s as close as gpt4o, it failed miserably despite I tried to reprompt it multiple times and burnt thru millions of tokens.

[-]

blargh4@reddit

Personally I found it fairly mediocre for real-world usage compared to daily-driver Sonnet 4.6 at work. Of course, this was with a different harness so not an apples-to-apples comparison.

[-]

Iory1998@reddit

Deepseek-v4-pro is my daily driver. Before, it was Gemini-3.1 pro, but not anymore.

[-]

spudlyo@reddit

I daily drive gemini-3.1-pro-preview-customtools on high, but got really solid results from Deepseek-v4-pro all day yesterday, and at a fraction of the price. Slower, more thoughtful, but ultimately it feels almost as capable. To me it seems more detail oriented, which was very useful in a long planning session, where it meticulously kept track of our list of items and issues to address along with their resolution. Gemini-3.1 is like "bro, just let me go, I'm done talking about it."

[-]

Iory1998@reddit

And it has a very good attention span. My conversation with it span over 700K and it remains coherent and remembers details from early chat turns.

[-]

Iory1998@reddit

Kimi-2.6 is too expensive to be honest. At that price tag, you better have Gemini.

[-]

Hoak-em@reddit

Kimi fits in the places that GLM-5.1 doesn't (frontend design), though for everything else I still use GLM-5.1

I use forgecode as a harness for glm-5.1, but opencode might be a better harness for Kimi atm, given that I find it misbehaving in forge and opencode has prompt and tool optimizations for Kimi

For stuff other than agentic coding, Kimi is probably the best all-around.

Both are difficult to run locally unless you have a helluva homelab, and if accessing via cloud api, Kimi is the most reliable since it's native int4, thus the model you get is the same as the one in the benchmarks

[-]

gestapov@reddit

Does any kimi provider offer a sub?

[-]

Hoak-em@reddit

Kimi coding plan or upcoming fire pass v2 from fireworksai, but v2 is locked to v1 early access users, so we won't know if it'll be offered until news about v3 comes out

Opencode go plan also seems like a good option, but I haven't tried it yet

[-]

Hoak-em@reddit

Also if you have the resources for local, native int4 helps a lot with hosting -- you can use AMX on xeons and 3090s or arc pro b70 -- kt-kernel is a good option for this model

[-]

Technical-Earth-3254@reddit

I can't pin down the best oss model, but K2.6 is one of the top 4 oss models (all tied 1st bc all are great for me).

[-]

mimrock@reddit

I had better luck with Mimo-V2.5-Pro, but that's only a sample of one task (python, via opencode) and via openrouter, not locally.

[-]

Uriziel01@reddit

From my experience it's very good, ignoring the benchmarks I've preferred it over GLM5.1, Minimax M2.7 and MiMo V2.5

[-]

jon23d@reddit

I use it almost exclusively, for coding.

[-]

LivingHighAndWise@reddit

From my experience, It's about Sonnet level intelligence for agentic coding.

[-]

Lissanro@reddit

Yes, Kimi K2.6 is good (I run Q4_X quant with llama.cpp on my PC). For UI and frontend work it is better than GLM 5.1 in my experience. However, for the backend work GLM 5.1 (tested IQ4 quant) did better for me in most cases.

[-]

Eyelbee@reddit

It overthinks a little bit and too generous with the tool calls, but I didn't really try it hard

[-]

wanielderth@reddit

lol isn’t kimi the goat? Whats this question?

[-]

Specter_Origin@reddit

Its pretty good model! their own plans are not very good though : (

Base model is straight up better than GPT 5.4 for real use case and response format etc.

[-]

tirprox@reddit

it is good.

[-]

Riseing@reddit

Works great, has no guardrails so it can be used on any project with no denials.
Thinks a bit too long at times, and can get confused in larger codebases.

Its my daily driver.

[-]

Real_Ebb_7417@reddit

It's good. I'd generally trust the benchmarks with plain models comparison between Kimi/Claude/GPT (so GPT is better obviously), but I really liked working with Kimi when testing their subscription.