Kimi-Linear-48B-A3B-Instruct-GGUF Support - Any news?

[-]

zoyer2@reddit

Ok here are my first impression on the model for code usage: Tested on 2x3090 130k context, great speed \~80 t/s. One-shotting seems kinda OK, tool usage through Kilocode seems so far OK in a small/medium sized project. I would say Qwen Next 80 A3B and GLM 4.7 flash might be better. need a bit more time though

Reply

[-]

Iory1998@reddit (OP)

Did you run in on Llama.cpp?

Reply

[-]

zoyer2@reddit

Ops, accidentally removed that part, yes on llama.cpp! :)

Reply

[-]

ilintar@reddit

PR almost done, gonna come with another speedup to Qwen3Next as well.

Reply

[-]

kripper-de@reddit

https://github.com/ggml-org/llama.cpp/pull/18755

Reply

[-]

kripper-de@reddit

Kimi is coming!!!

Reply

[-]

LegacyRemaster@reddit

remember: you are our hero!

Reply

[-]

TheGlobinKing@reddit

Amazing, thanks! Is there a github issue I can follow for that Qwen3Next speedup?

Reply

[-]

Iory1998@reddit (OP)

Do you have any idea how much time left before it gets merged?

Reply

[-]

Iory1998@reddit (OP)

Thank you for your hard work. Kimi-Linear is a good model. Please take good care of it :D

Reply

[-]

dinerburgeryum@reddit

The man right here folks

Reply

[-]

ilintar@reddit

Not my PR tho, just working with the author to make a common abstraction for delta net models.

Reply

[-]

dinerburgeryum@reddit

Yea you’re doing the work dude keep it up. 👍

Reply

[-]

Amazing_Athlete_2265@reddit

Fuck yeah

Reply

[-]

Ok_Warning2146@reddit

If u want to run it, u can clone my repo. It should be almost the same as the one that is going to be merged. https://huggingface.co/ymcki/Kimi-Linear-48B-A3B-Instruct-GGUF

Reply

[-]

Iory1998@reddit (OP)

Thank you. I use LM Studio. I'll wait for the update.

Reply

[-]

coder543@reddit

We have _so many_ A3B models... I really want some A1B and A5B options to mix things up.

Reply

[-]

R_Duncan@reddit

Granite 4.0 has a A1B model. As expected, is way less performante than the A3B version.

Reply

[-]

coder543@reddit

Granite 4.0 MoEs (the A#B naming) come in 32B A9B and 7B A1B sizes. It is not shocking that such drastically different sizes would perform different, yes. These are also very low sparsity models. The rumor is that Gemini 3 Flash is a >1T model with a very low active parameter count. I have 128GB of medium speed memory. I want a 200B A1B model that is released specifically in a 4-bit precision (QAT, not PTQ). Extreme levels of sparsity, not 7B A1B.

Reply

[-]

R_Duncan@reddit

I think you have to train youself such an unbalanced model, max sparsity till now is 80B-A3B

Reply

[-]

coder543@reddit

That's why I mentioned that higher sparsity models seem to exist, they're just not open weight, and that's why I want such a model. If companies keep releasing A3B, that's their choice, but it will be hard to get excited about that.

Reply

[-]

sloth_cowboy@reddit

Tell me where to start and ill make them. BTW, I never trained a model, and I dont have a server

Reply

[-]

FullOf_Bad_Ideas@reddit

Get data from HF (FineWeb2/FinePDFs) and HPLT3 project. Get comfortable with Megatron-LM and rent 768xH100 node like [this one](https://gpulist.ai/detail/37a1aa3). Train a model with Megatron-LM on that node, then post-train with SFT, then do preference optimization with PPO/ORPO and then do RL with GRPO in [slime](https://github.com/THUDM/slime). Hardware cost is the main limiting factor, I trained 4B A0.3B models on 60/80B tokens and it already cost thousands in compute. You'll need 10M to train a model successfully, but you can manage to do it on your own if you really want, since so much of the stack is open source.

Reply

[-]

FullOf_Bad_Ideas@reddit

I want more A20B and A30B. 120B A30B would be good. 70B A20B too.

Reply

[-]

Not_Syslog@reddit

The only A5B I know of is gpt-oss:120b

Reply

[-]

KvAk_AKPlaysYT@reddit

Here are the current experimental Guf-Gufs: [https://huggingface.co/AaryanK/Kimi-Linear-48B-A3B-Instruct-GGUF](https://huggingface.co/AaryanK/Kimi-Linear-48B-A3B-Instruct-GGUF) Keep in mind that you'd have to run it through the [**PR #17592**](https://github.com/ggml-org/llama.cpp/pull/17592) and not the master branch.

Reply

[-]

alhinai_03@reddit

I'm currently running the model with your branch, it's very promising, but I couldn't find any recommended inference settings, like temp, top-p, top-k. Any idea?

Reply

[-]

Iory1998@reddit (OP)

I use LM Studio which is a few weeks behind the latest llama.cpp update. However, Kimi-Linear is an important model, and I think once it's merged with the main branch, the LM Studio will quickly update their platform to support it. Do you have any idea how much time left before it gets merged?

Reply

[-]

BasketFar667@reddit

They're rapidly declining due to restrictions and the fact that they're not fully open source. Quen is winning, Deepseok is winning too, and Kimi is lagging behind overall. Gemini is improving, but not by much. If GA reaches 3.0, it'll improve.

Reply

[-]

mr_zerolith@reddit

Good question, this one seems to have just been forgotten about

Reply

[-]

Iory1998@reddit (OP)

It took so long. I wish we could just get an update.

Reply

[-]

kaisurniwurer@reddit

You can always check git. Though I approve of you trying to generate hype too, since I'm personally interested.

Reply

[-]

Iory1998@reddit (OP)

Generate hype, he said.... with a one-line post! 🤦‍♂️🤦‍♂️

Reply

[-]

AnomalyNexus@reddit

Would this fit on a 24gb card? Guessing only with offload

Reply

[-]

Iory1998@reddit (OP)

Well, it's a MoE, so it would still be fast.

Reply

[-]

nuclearbananana@reddit

To we have any other benchmarks besides context arena. That one is too specific to draw general conclusions from

Reply

[-]

Amazing_Athlete_2265@reddit

https://old.reddit.com/r/LocalLLaMA/comments/1pvvv8m/kimilinear_support_in_progress_you_can_download/

Reply

Kimi-Linear-48B-A3B-Instruct-GGUF Support - Any news?

Reply to Post

37 Comments

zoyer2@reddit

Iory1998@reddit (OP)

zoyer2@reddit

ilintar@reddit

kripper-de@reddit

kripper-de@reddit

LegacyRemaster@reddit

TheGlobinKing@reddit

Iory1998@reddit (OP)

Iory1998@reddit (OP)

dinerburgeryum@reddit

ilintar@reddit

dinerburgeryum@reddit

Amazing_Athlete_2265@reddit

Ok_Warning2146@reddit

Iory1998@reddit (OP)

coder543@reddit

R_Duncan@reddit

coder543@reddit

R_Duncan@reddit

coder543@reddit

sloth_cowboy@reddit

FullOf_Bad_Ideas@reddit

FullOf_Bad_Ideas@reddit

Not_Syslog@reddit

KvAk_AKPlaysYT@reddit

alhinai_03@reddit

Iory1998@reddit (OP)

BasketFar667@reddit

mr_zerolith@reddit

Iory1998@reddit (OP)

kaisurniwurer@reddit

Iory1998@reddit (OP)

AnomalyNexus@reddit

Iory1998@reddit (OP)

nuclearbananana@reddit

Amazing_Athlete_2265@reddit