kv-cache : support attention rotation for heterogeneous iSWA by ggerganov · Pull Request #21513 · ggml-org/llama.cpp

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 16 comments

tl;dr: Fixes KV-cache rotation for hybrid-attention models like Gemma 4