Anyone actually coded with Kimi K2 Thinking?

[-]

Trollfurion@reddit

I’ve tried it to code a website from the prompt, it do worse than qwen3 32 vl for example

[-]

TheRealMasonMac@reddit

It makes coding mistakes that make me not want to use it for actual coding. Might be good for planning side? Not sure.

[-]

I prompted the official API with a simple edit to improve the CSS of an existing simple self-contained webapp, and it broke the JavaScript when it changed classes without updating the JS. GLM-4.6 could do this without even needing thinking.

[-]

shaman-warrior@reddit

Kimi k2 thinking as model? I have tried it yest and today with their coding plan but as model I used kimi-k2-thinking instead of kimi-for-coding.

[-]

TheRealMasonMac@reddit

For the webapp case, it was just the straight API call.

[-]

TheRealMasonMac@reddit

Yeah, as a model.

[-]

kogitatr@reddit

I regret subscribing even to their $19 plan. To my experience, it's slower than sonnet and deliver not as good or sometimes disobey the prompt

[-]

shaman-warrior@reddit

I also subscribed. What model did you use?

[-]

lemon07r@reddit

It's currently broken for all agents other than Kimi CLI because they have tool calling within their reasoning tags but this isn't supported by any agents atm other than Kimi CLI. Should hopefully be fixed soon in most agents.

[-]

vincentz42@reddit

This needs to be upvoted higher. I used Kimi CLI and found the model to be very smart in agentic coding.

[-]

ps5cfw@reddit

I've given it a fairly complex task (fix a bug in a fairly complex .NET repository class) and it solved it in two shots.

It's OK, it tends to think a lot, but it's not too much

[-]

Federal_Spend2412@reddit (OP)

Thanks, I'm planning to try using Kilo Code + Kiki K2 Thinking in my project to test it out.

[-]

Brave-Hold-9389@reddit

use claude code, its allows kimi to use a diferent type of reasoning

[-]

GregoryfromtheHood@reddit

How do you use it with Claude code? I've tried using Claude Code Router a few times to use different models but can never got the model to act right. I always default back to using Roo code for any other models because they just work there even if it is a bit of a context hog

[-]

Brave-Hold-9389@reddit

Here, check this out

[-]

GregoryfromtheHood@reddit

Oh. Anthropic compatible endpoint via a cloud provider, yeah nah I'm not really interested in that. I'm talking about running models locally using openai compatible API endpoints.

I think something in the conversion process isn't 100% right and I haven't been able to get very good performance out if Claude Code with local models.

[-]

AI_should_do_it@reddit

I assume you ran it locally? What’s the hardware?

[-]

YouAreTheCornhole@reddit

It should be a lot better for the amount of hype

[-]

loyalekoinu88@reddit

Agreed. It’s not bad BUT it also isn’t a coding model. It’s an agent/general model. How much of that model space is dedicated to code is up for debate.

[-]

YouAreTheCornhole@reddit

If it wasn't gigantic I'd have more hope here, but for it's size it should be a lot better than it is

[-]

loyalekoinu88@reddit

I mostly agree but do we have other open trillion parameter models to compare to that are better? I think this model as a base will produce great coding focused models of similar size that are better in that domain. Just a matter of time. :)

[-]

llmentry@reddit

I mostly agree but do we have other open trillion parameter models to compare to that are better?

We have open models with far fewer params that are arguably better. Does that count?

[-]

YouAreTheCornhole@reddit

I hope so but it's kind of like throwing a poop at a house fire, especially when models way smaller are doing things better

[-]

loyalekoinu88@reddit

That’s a fair assessment. What’s models are you presently using and for what kind of coding work?

[-]

YouAreTheCornhole@reddit

I mainly use Sonnet 4.5 and all kinds of stuff, mainly Python and Go, and C++. Lots of AI and ML stuff

[-]

Federal_Spend2412@reddit (OP)

The GLM 4.6 isn't as powerful as advertised. I'm just a little worried about the Kimi K2 Thinking compared to the GLM 4.6 in the same situation.

[-]

YouAreTheCornhole@reddit

Kimi K2 Thinking is definitely worse than GLM 4.6

[-]

Federal_Spend2412@reddit (OP)

I just know Glm 4.6 > minimax m2

[-]

Final-Rush759@reddit

For me, minimax m2 is better than GLM-4.6. It all depends on what you want to do. None of models are perfect. If you have problems, try a different model. I think GPT-5 is very good in fixing bugs.

[-]

usernameplshere@reddit

Im curious, what scenario did u use it in?

[-]

Brave-Hold-9389@reddit

in frontend

[-]

Pink_da_Web@reddit

No, it's not. For me, it's much better than the GLM 4.6. Why do you think that?

[-]

TheRealGentlefox@reddit

Advertised by who? A lot of coders vouch for its capabilities. I haven't done super extensive testing yet but I quite like it.

[-]

redragtop99@reddit

GLM 4.6 is the best local model I’ve used for text. It’s consistent and right.

[-]

daavyzhu@reddit

[-]

Born_Operation_6222@reddit

It seems that It's only good at the agentic and IF scores? In all other scores, it's worse than deepseek r1.

[-]

Wishitweretru@reddit

Tried it for a fay, it kept failing during project onboarding, figured it might be growing pangs, I’ll try again in a couple days

[-]

kaggleqrdl@reddit

It was impressive on on a simple task, but on a larger refactoring one it broke pretty badly. seems to over complicate things. worth trying a few more attempts I think.

[-]

Special_Cup_6533@reddit

For single code files it is fine, but when I introduce multiple files in a code base, it falls apart and makes many errors, and is unable to fix them. I end up swapping to deepseek and deepseek fixes them all.

[-]

mileseverett@reddit

I put my standard fairly complex computer vision architecture modification questions and it consistently fucked up the dimensions of tensors and couldn't fix itself even after multiple rounds. I found that only closed models get these correct

[-]

mborysow@reddit

I just want to know if anyone has managed to get it running with sgLang or vLLM with tool calling working decently.

It seems like it's just a known issue, but it makes it totally unsuitable for things like Roo Code / Aider. I understand the fix is basically an enforced grammar for the tool calling section, but hopefully that will come soon. We have limited resources to run models, so if it can't also do tool calling we need to save room for something else. :(

Seems like an awesome model.

For reference:
https://blog.vllm.ai/2025/10/28/Kimi-K2-Accuracy.html
https://github.com/MoonshotAI/K2-Vendor-Verifier

Can't remember if it was vLLM or sglang for this run, but:
{

"model": "kimi-k2-thinking",

"success_count": 1998,

"failure_count": 2,

"finish_stop": 941,

"finish_tool_calls": 1010,

"finish_others": 47,

"finish_others_detail": {

"length": 47

},

"schema_validation_error_count": 34,

"successful_tool_call_count": 976

}