attn-rot (TurboQuant-like KV cache trick) lands in llama.cpp

Posted by Dany0@reddit | LocalLLaMA | View on Reddit | 28 comments

80% of the benefit of TQ with almost no downsides. Q8 is now ≈ F16