After Kimi K2 Is Released: No Longer Just a ChatBot

Posted by nekofneko@reddit | LocalLLaMA | View on Reddit | 36 comments

This post is a personal reflection penned by a Kimi team member shortly after the launch of Kimi K2. I found the author’s insights genuinely thought-provoking. The original Chinese version is here—feel free to read it in full (and of course you can use Kimi K2 as your translator). Here’s my own distilled summary of the main points:

• Beyond chatbots: Kimi K2 experiments with an “artifact-first” interaction model that has the AI immediately build interactive front-end deliverables—PPT-like pages, diagrams, even mini-games—rather than simply returning markdown text.

• Tool use, minus the pain: Instead of wiring countless third-party tools into RL training, the team awakened latent API knowledge inside the model by auto-generating huge, diverse tool-call datasets through multi-agent self-play.

• What makes an agentic model: A minimal loop—think, choose tools, observe results, iterate—can be learned from synthetic trajectories. Today’s agent abilities are early-stage; the next pre-training wave still holds plenty of upside.

• Why open source: (1) Buzz and reputation, (2) community contributions like MLX ports and 4-bit quantization within 24 h, (3) open weights prohibit “hacky” hidden pipelines, forcing genuinely strong, general models—exactly what an AGI-oriented startup needs.

• Marketing controversies & competition: After halting ads, Kimi nearly vanished from app-store search, yet refused to resume spending. DeepSeek-R1’s viral rise proved that raw model quality markets itself and validates the “foundation-model-first” path.

• Road ahead: All resources now converge on core algorithms and K2 (with hush-hush projects beyond). K2 still has many flaws; the author is already impatient for K3.

From the entire blog, this is the paragraph I loved the most:

A while ago, ‘Agent’ products were all the rage. I kept hearing people say that Kimi shouldn’t compete on large models and should focus on Agents instead. Let me be clear: the vast majority of Agent products are nothing without Claude behind them. Windsurf getting cut off by Claude only reinforces this fact. In 2025, the ceiling of intelligence is still set entirely by the underlying model. For a company whose goal is AGI, if we don’t keep pushing that ceiling higher, I won’t stay here a single extra day.

Chasing AGI is an extremely narrow, perilous bridge—there’s no room for distraction or hesitation. Your pursuit might not succeed, but hesitation will certainly fail. At the BAAI Conference in June 2024 I heard Dr. Kai-Fu Lee casually remark, ‘As an investor, I care about the ROI of AI applications.’ In that moment I knew the company he founded wouldn’t last long.

[-]

Appropriate_Web8985@reddit

it's not crazy when the road ahead is b300-tier clusters with 20TB combined hbm. k2 is meant to be run in datacenters, local will generally be in the 10-72 range because ddr is not big enough to support huge models

[-]

Relative_Rope4234@reddit

What is the overall best model in that range

[-]

Corporate_Drone31@reddit

I think that space is still waiting for the killer model. Unless you want role-play, there doesn't seem to be a clear winner. Gemma 3 27B is a good generalist, Qwen coders seem all right, and the 72s seem to be the closest to being smart. 100+ seems to be the level where they become more capable.

[-]

CardAnarchist@reddit

Unless you want role-play, there doesn't seem to be a clear winner.

What's the clear winner if all you care about is roleplay?

[-]

No_Efficiency_1144@reddit

Doesn’t feel like there is a current clear killer local model yeah

Maybe Gemma 3 27B QAT but by now we have seen limits of that model

[-]

No_Efficiency_1144@reddit

My experience was that dollars go down fast when you rent B200s. You are right though.

I find it interesting that narrow, specialist 3B and 7B LLMs still do well compared to the massive models. I wonder if 3B and 7B will continue to scale. There must be some limit eventually.

[-]

Appropriate_Web8985@reddit

compute is growing but will probably remain constrained so there will continue to beeffort from closed source labs to develop good small specialist models and work on good routing to different types of models to serve different users economically. the similarity is because of log-linear returns to scaling, every 10x improvement creates more linear improvements. so 700b models are more like 2 orders of magnitude above 7b rather than 100x better in some linear way

Tiny_Judge_2119@reddit

It's the first model trained for agentic use, hope it will have more to come..1T parameters model is not really usable for the local llm community

RhubarbSimilar1683@reddit

I see 1T parameters as a wiki even if it can't be run locally by most. It helps democratize AI even if just in theory

tvmaly@reddit

Are there any LLM api places offering it at an affordable price?

ELPascalito@reddit

OpenRouter offers a Kimi-k2:free version that you can use under the free daily quota, doesn that count as a good price?

seunosewa@reddit

The non-free version on openrouter is also cheaply priced.

That is an amazing price.

Crosbie71@reddit

Thank you for the pointer!

I notice that OpenRouter automatically suggest some popular test queries (like Rs in strawberry). It passes that test by working through it methodically. It still screws up on basic word counts: counting methodically but then still reporting a false total.

TheRealMasonMac@reddit

It makes me wonder two things:

How bad was Behemoth that Meta was too embarrassed to release it? It would have been twice the overall parameters as K2.
Maybe the rumors that R2 is a trillion parameters have some credibility.

2 terabytes of VRAM in FP16 is so crazy

Caffdy@reddit

Kimi2 was trained in FP8 like Deepseek, you don't need 2TB to run it

Yeah I gave the FP16 number for dramatic effect lmao

samorollo@reddit

Understandable

It being open weights is a boon for two reasons:

even if most of the community cannot run it, you get the ability to deploy it using cloud/data center resources, or to pay for API usage from multiple vendors. This let's you control how restricted the model policy is, meet restrictive compliance/privacy needs (for internal deployment), apply token massaging (prefill, custom sampling strategy, feed your most valuable trade secrets into RAG without worrying, control when the model is deprecated and goes away in favour of "newer and better" (unlike OpenAI and Anthropic who, at best, have a single partner that can deploy the model besides them).
we initially couldn't even run Llama 7B with ease. Then quantisations and advancements in mixed CPU+GPU/SSD streamed weights inference came, and people started to be able to run ever larger models on existing hardware. If a 1T model is open, we can try all sorts of things: pruning, distillation, deleting experts, ideas we haven't one up with yet, but will in 6 months.

So I argue that yes, for many it will not be practical. But it's runnable on the same hardware that is capable of running DeepSeek R1. And that hardware in turn doesn't cost all that much on the used market, if you are happy for responses to take half an hour (email) rather than 1 minute (chatting).

AppearanceHeavy6724@reddit

whoosh

101m4n@reddit

What?

Too many words, low subsance.

That's not a "whoosh" thing.

Briskfall@reddit

Let me be clear: the vast majority of Agent products are nothing without Claude behind them

GOAT recognizes GOAT.

-p-e-w-@reddit

That’s a gross exaggeration, though. Many useful AI-based services are essentially classifiers, and many of the underlying tasks can be performed just fine with a 3.5B Qwen model.

Guandor@reddit

Are those agents?

Yeah some definitions put the bar low enough.

I think “agent” is one of the least useful terms in ML though due to the enormously varying definitions

Your pursuit might not succeed, but hesitation will certainly fail.

Amen.

Translation: https://pastebin.com/hgBG4Kdh