PSA Kimi K2 Thinking seems to currently be broken for most agents because of tool calling within it's thinking tags
Posted by lemon07r@reddit | LocalLLaMA | View on Reddit | 2 comments
Yeah, just what the title says. If any of you are having issues with coding using K2 thinking it's because of this. Only Kimi CLI really supports it atm. Minimax m2 had a similar issue, and I think maybe glm 4.6 too (not sure, don't quote me on this), but hopefully most agents will have this fixed soon. I think this is called interleaved thinking, or is something similar to that? Feel free to shed some light on this in the comments if you're more familiar with what's going on.
teachersecret@reddit
It’s built to do multiple tool calls in the same response, and to build upon those tool calls to answer.
This means you have to stop it after each thinking tool call, run the call, append the results, and start a prefix completion still in thinking mode. If you do this, it can churn through multiple calls before giving you its final answer.
That said… my preliminary testing left me disappointed. I watched it call things that had absolutely nothing to do with the question at hand, and go off on some nonsense. It’s cool that it can chain hundreds of tool calls but less useful when half of them are randomly checking the weather.
I might give it a bit more testing tomorrow and toss up an example repo.
SrijSriv211@reddit
Yeah interleaved thinking is what I suspect is causing a lot of issues. Not every provider supports it. I might be wrong but I think Claude Code also supports it if you try to hack into it somehow, idk really. I heard this in a video.. I think it'll be soon fixed and I also think that many more models will be adopting the interleaved thinking style.