Kimi K2.6 - the mighty turtle that wins the race

Posted by cjami@reddit | LocalLLaMA | View on Reddit | 27 comments

Hi folks, I've been benching Kimi K2.6 for the past few days, and I'd like to share my findings.

For context, this is based on a benchmark I've created that pits models against each other in autonomous games of Blood on the Clocktower - a highly complex social deduction game.

Findings:

K2.6 has played 64 games so far (2 games per match), these are early results but it has absolutely dominated the leaderboard through consistent wins against other models.

K2.6 is slow, generating an average of 570,000 tokens per game. Gemini 3.1 Pro, for contrast, generates 180,000 tokens per game. An average match takes about 1-3 hours, with K2.6 it takes about 10-15 hours (using Moonshot AI as a provider).

K2.6 is expensive - mainly due to the high token output, costing $2.31/game. This is still significantly less than Claude Opus 4.6, which costs $3.79/game. GLM 5.1, however, costs a more modest $0.88/game.

Reliability is decent with a 0.9% tool call error rate.

Notable moves:

Notable mistakes:

Kimi K2.6 transcripts: https://clocktower-radio.com/search?a=Kimi+K2.6

How-it-works: https://clocktower-radio.com/how-it-works