Kimi Linear 30% gain in pp and higher context merged to llama.cpp
Posted by Ok_Warning2146@reddit | LocalLLaMA | View on Reddit | 13 comments
[https://github.com/ggml-org/llama.cpp/pull/19827](https://github.com/ggml-org/llama.cpp/pull/19827)
Accidentally found that just changing one line can boost prompt processing by 30% and increase context of IQ3\_M on 3090 from 192k to 300k.
It would be great if people with 5090 can report how much context they can get at various quants.
13 Comments
EdenistTech@reddit
Ok_Warning2146@reddit (OP)
EdenistTech@reddit
Ok_Warning2146@reddit (OP)
EdenistTech@reddit
Ok_Warning2146@reddit (OP)
Deep_Traffic_7873@reddit
Ok_Warning2146@reddit (OP)
GodComplecs@reddit
Ok_Warning2146@reddit (OP)
kaisurniwurer@reddit
Ok_Warning2146@reddit (OP)
jacek2023@reddit