Kimi K2.6 imminent | TheaterFire

[-]

TheRealMasonMac@reddit

https://www.reddit.com/r/LocalLLaMA/comments/1s6stgl/kimi_k26_will_drop_in_the_next_2_weeks_k3_is_wip/

LMAO this is funny in hindsight

[-]

-p-e-w-@reddit

I mean to be fair 95% of such claims are BS.

If a dowser happens to stumble upon water, you don’t conclude that dowsing works.

[-]

oroora6@reddit

Maybe you're just not dowsing hard enough.

I tested it by using the orientation of the dowser and google maps. 10/10 times I would have eventually found an ocean if I kept going in the direction the dowser was pointing at.

[-]

HardworkPanda@reddit

he is just a dumb top poster, he thinks he knows everything even if he hasn't tried it ever.

[-]

mouseynaides@reddit

That guy really just randomly leaked kimi k2.6. what a goat

[-]

DerDave@reddit

He got soo much shit for his comment post. Poor bastard didn't even lie.

[-]

pneuny@reddit

To be fair, it's hard to know if it's true when it isn't backed up with a source.

[-]

DerDave@reddit

He actually said he had a source (buddy working at moonshot.ai ) - only he couldn't proof it haha

[-]

MoodDelicious3920@reddit

Almost everyone abused him in replies , saying things like "Who r u to say" 😂

[-]

The model size creep is wild. A year ago 70B was "huge" and now we're seeing 400B+ models that require enterprise setups. At some point the local crowd is going to hit a hard wall where you literally can't run the frontier models locally anymore. That's when things get interesting...

[-]

the_omicron@reddit

Don't worry, Gemma 4 26B A4B is pretty good already for non-coding tasks.

[-]

KeinNiemand@reddit

interesting

Well bigger simply = better, yes efficiency as in how much smarts you can get out of x size can improve, but every efficiency improvement can be used in 2 ways. A: smarter model for the same size B: smaller model for the same smarts.

As long as models scale and get better with more parameters whatever frontier is will always tend to get larger and larger.

[-]

Successful-Brick-783@reddit

GPT-3 has 175 billion parameters and was released 6 years ago, idk why you think 400B is wild

[-]

Caffdy@reddit

enterprise and server are gonna hit walls as well; not the same ones, but they too have limits in inference and size

[-]

pr3miere@reddit

It just dropped!

[-]

No_Conversation9561@reddit

I yearn for the days when “dropped” meant dropped weights on huggingface.

[-]

VEHICOULE@reddit

I'm interested into their moderato subscribtion, would you mind sharing your impressions ?

[-]

vincentz42@reddit

Not OP, but I'm on the Moderato plan. Here's a brief review:

Pros:

K2.5 was probably the best open model available. It was my daily coding model. I used it for roughly 80-90% of my tasks and delegated the rest to Codex 5.3/5.4 on my ChatGPT Plus plan. I don't use Claude because I don't like the company's values, and I find Codex better for my use case anyway.
Their Moderato plan is very generous and should be sufficient for most people. I burned through 34M tokens last week and still had \~50% of my weekly quota left. Kimi CLI also feels more token-efficient than Claude Code IMHO. I'd expect Moderato to be enough for most users.
I've only used K2.6 for a couple of hours, but it does feel like a noticeable improvement over K2.5.

Cons:

The Kimi CLI team appears to have undergone some personnel changes, with several of the original developers left. Since then, "new features" have been added to the CLI, but the model wasn't trained to use them, resulting in frequent tool-calling errors. It got bad enough that I had to downgrade the CLI version. The team also seems less responsive to GitHub issues than before. K2.6 mitigates some of these problems, but they're not fully resolved.
I suspect Kimi will gradually become less open and transparent , similar to Anthropic and OpenAI. The latest CLI version redacts thinking traces from the terminal and VS Code (though they're still visible in logs and CLI web sessions), and support for "encrypted thinking traces" has been added (not yet enabled). It's also unclear whether K2.6 will be open weights at all. If supporting open model development is part of your goal, that's worth keeping in mind.

[-]

Clear-Ad-9312@reddit

Have you tried it through the OpenCode CLI? Seems to perform better. Also, you seem to be able to talk to K2.6 and the K2.6-code-preview models through the api. I wonder how different both models are.

[-]

sjsosowne@reddit

Does the 34M include cached input?

Reason I ask is I use gpt-5.4 for my day job at the moment and my company supplies API access. Because we have essentially no limit its easy to burn through 2-300M cached input a day, sometimes more. My usage last week was 1.5B.

I'm looking to move to a more open model if possible... But we are Azure based so unless it's on foundry we will have to use a subscription, and I reckon we're not going to find one that covers it.

[-]

TheRealMasonMac@reddit

On the bright side, kimi-cli is very hackable! I have my own fork with changes.

[-]

TokenChingy@reddit

Oh my god, no wonder Kimi felt much more capable today.

[-]

DerDave@reddit

Did it? Are you sure you already had it all day?

[-]

DerDave@reddit

Where is this?

[-]

Dany0@reddit

Y'all missed the mostly important detail. Kimi K2.6 Code

it's a code focused finetune! Maybe they looked at Mythos and thought we can do that too

[-]

Clear-Ad-9312@reddit

Seem to be able to talk to both K2.6 and the K2.6-code-preview models through the API.
I wonder how different both models are.

[-]

Guardian-Spirit@reddit

> Maybe they looked at Mythos and thought we can do that too

Training takes way longer than that.

[-]

seamonn@reddit

not if you distill it

[-]

zdy132@reddit

Guys someone is paying me to answer questions, am I being distillation attacked?

[-]

seamonn@reddit

just ask questions back instead of answering.

[-]

zdy132@reddit

I will just tell them "Go to sleep." That's a good trick.

[-]

Dany0@reddit

Finetune

[-]

Orolol@reddit

Finetune is training. It still takes longer than that on a 1T model.

[-]

KickLassChewGum@reddit

It'll heavily depend on how you fine-tune. For a 1T model, you could curate a manageable high-quality corpus of a hundred thousand examples or so and run it through SFT.

If you want to LoRA towards a more specific use case, you could try half of that again. Depends on the gains you're looking for.

[-]

Dany0@reddit

Exactly. And Mythos has been teased ever since it was available to anthropic employees (march iirc?)

[-]

Clear-Ad-9312@reddit

You can actually talk with Kimi K2.6 through the API

[-]

Due_Net_3342@reddit

another model that I cannot run even on my 144gb setup :)

[-]

SilentLennie@reddit

Pretty certain with llama.cpp can put as much as possible in VRAM, overflow to loading RAM as need and even overflow to loading from disk. With MoE, that should help a whole bunch.

[-]

UpperParamedicDude@reddit

That's insane how fast models are getting bigger. I have 36GB of VRAM plus 64GB of DDR4 RAM and I'm already memory poor for majority of great models that come out these days. In bitnet we believe, hope it'll become more common

[-]

Limp_Classroom_2645@reddit

It's also insane how models are getting smaller and more capable, I tried qwen 3.5 4B on a 6GB Vram card, i was shocked at how good it is for its size, felt like a 20B model from a year ago

[-]

alphapussycat@reddit

Yes, I tried the opus distilled ones. Only qwen3.5 27b at q4_k_m was borderline fine. Then I tried the non-distills and even the 9b at q4 is better than the 27b opus distilled.

I've found the 9b to be very impressive, with my very limited testing. I'd say it also outperforms 35b a3b.

Last time I tried any, like qwen2.5 they were unusable.

I really really hope we get the 3.6 version too...

[-]

Potential-Gold5298@reddit

In my experience, "distilled models" remain roughly the same at best, and at worst, they become dumber. All this "Claude x100500 reasoning" is mainly good for improving writing style, but not for enhancing intelligence. Alibaba or Google likely distill their own large models, and do it more efficiently, so third-party distillation adds nothing.

[-]

ayylmaonade@reddit

Yep. I don't understand all of the sudden hype around these Claude distills. Those silly "3000x brainstorm" Claude/Gemini/GPT reasoning datasets have been getting distilled into all sorts of models for like 6+ months now, and I can't think of a single one that was even on par with the original base model. Before 3.5, I tried a good few Qwen3 distills of this type and every single one of them was worse than the default Qwen.

[-]

Kodix@reddit

Exactly my experience, as well. Their popularity seems to be hype-based rather than performance-based.

[-]

IrisColt@reddit

You nailed it. Instruction following is always damaged significantly.

[-]

Limp_Classroom_2645@reddit

My sentiment as well

[-]

TheRealMasonMac@reddit

It's because an SFT dataset, unless carefully crafted, will undo the RL training the model underwent.

[-]

Objective-Stranger99@reddit

Qwen3.5 4B currently beats GPT OSS 20B in most benchmarks.

[-]

FaceDeer@reddit

There's benefits to models of all sizes becoming open, even if I can't run them locally. As we've been seeing with the recent fiascos involving Anthropic nerfing or locking away their APIs it's important to know that other big companies can provide access to those models as well regardless of what the models' originators decide to do with them.

[-]

soyalemujica@reddit

What model do you rank as the best for coding with that setup?

[-]

UpperParamedicDude@reddit

Hmm, depending on your needs I think. At the moment I use this model for most of the stuff

Jackrong/Qwopus3.5-27B-v3-GGUF

I could easily run some \~4bpw 122B-A10B finetune but to me speed, free memory for dekstop usage and to fit some image gen models into VRAM simultaneously matters. Honestly, idk if the model I'm using even good by rolling standards but it does almost everything I need right now and I'm content with it

[-]

alphapussycat@reddit

Don't bother with the opus distills, compared to non-distills they're lobotomized, it's like might and day.

[-]

Ok_Technology_5962@reddit

Another model i can probably barely maybe not run on my 512 setup... Have to rpc systems together or disk offload

[-]

IrisColt@reddit

heh

[-]

alphapussycat@reddit

But still good to have. One day maybe you'll get the CPU and ram to run it... You're probably never running this on vram though.

[-]

LagOps91@reddit

152gb total here (ram+vram)... it's great to have it, but it's still far from enough. we all need some of that 1 bit model magic.

[-]

Mashic@reddit

But smaller models are getting better too.

[-]

Aggressive-Permit317@reddit

Kimi dropping 2.6 already? This is moving stupid fast. I’ve been running Kimi variants locally and the context handling + tool use has been surprisingly clean. Anyone got early leaks on what’s actually new in this one or are we waiting for the official drop to benchmark it against Gemma 4 and Qwen 3.5?

[-]

nuclearbananana@reddit

Hey can you give me a recipe for banana bread

[-]

Different_Fix_2217@reddit

K3 will probably be great https://www.youtube.com/watch?v=2IfAVV7ewO0

[-]

nuclearbananana@reddit

That's a very clickbaity title.. and in the length of the video you could just read the paper yourself

[-]

DerDave@reddit

Yeah, really looking forward to K3. So many nice Innovations from open source labs... Residual Attention, Engram, e.g. Also can't wait for big models to adopt the idea of dflash diffusion speculative decoding...

[-]

pneuny@reddit

Great timing with Anthropic nefing Opus 4.6 due to capacity issues.

[-]

WPBaka@reddit

Hype! Kimi K2.5 is one of my favorite models. Something about it just feels unique compared to ther releases IMO. I really like it's prose too

[-]

MoodDelicious3920@reddit

I think kimi k2.5 is the only model currently comparable to proprietary sota--especially for general stem, non coding tasks

[-]

TraditionalAdagio841@reddit

Another high-value model, great!

[-]

silenceimpaired@reddit

How I wish I could run the model.

[-]

B89983ikei@reddit

We could barely use 2.5!

[-]

segmond@reddit

Tell us when it's on huggingspace.

[-]

Shockersam@reddit

Ok astronaut

[-]

pigeon57434@reddit

i hope its not just code and there will also be a kimi k2.6

[-]

muyuu@reddit

This will be for very high end setups, but still very exciting if they can keep the improvement shown in their earlier releases. Huge to have something really good that can be run with significant resources, but without depending on any particular vendor shenanigans.

[-]