Kimi 2.6 question

Posted by vhthc@reddit | LocalLLaMA | View on Reddit | 18 comments

I am aware that this is kinda a dumb question, but I think I am missing something.

Kimi 2.6 is a 1.1T model with 30b active parameters. It is encoded in INT4. Hence its size is ~600MB.

So with 768GB RAM and 2x3090 (=48GB VRAM) it should be possible to run this, right? 600GB in RAM, ~18GB active parameters in VRAM, context of 100-200kb should fill the remaining 30GB of the VRAM.

I don't expect the speed will be great - maybe 10 t/s?

I think 2x3090 (or more) is something a lot of people here on the sub have available. The 768GB Ram is a harder problem, but before the RAM price spike this was about 2500$ (12x 64GB sticks ~ 200$ each for DDR5), so beside the CPU and motherboard needing to be premium to have the capacity for the RAM - to me this sounds like a machine a lot of people could run locally, I would call it "advanced hobbyist" price range :-)

So why are people saying the Kimi 2.6 is not "local" for most people? Am I missing something? (Serious question, I do not have a 768GB RAM machine, but I am tempted once the prices get down at some point).

Thanks!