It looks like we’ll need to download the new Gemma 4 GGUFs

Posted by jacek2023@reddit | LocalLLaMA | View on Reddit | 110 comments

https://huggingface.co/unsloth/gemma-4-E2B-it-GGUF

https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF

by u/danielhanchen:

We just updated them again in response to:

  1. kv-cache : support attention rotation for heterogeneous iSWA https://github.com/ggml-org/llama.cpp/pull/21513
  2. CUDA: check for buffer overlap before fusing - CRITICAL fixes <unused24> tokens https://github.com/ggml-org/llama.cpp/pull/21566
  3. vocab : add byte token handling to BPE detokenizer for Gemma4 https://github.com/ggml-org/llama.cpp/pull/21488
  4. convert : set "add bos" == True for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21500
  5. common : add gemma 4 specialized parser https://github.com/ggml-org/llama.cpp/pull/21418
  6. llama-model: read final_logit_softcapping for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21390
  7. llama: add custom newline split for Gemma 4 https://github.com/ggml-org/llama.cpp/pull/21406