PSA: Gemma 4 template improvements

[-]

aldegr@reddit

For llama.cpp, you'll have to wait for https://github.com/ggml-org/llama.cpp/pull/21704 before using this template.

[-]

theUmo@reddit

Stars didn't align for me. Applying this template to the GGUF I had downloaded caused total model failure. It looks like the GGUFs have been updated now though.

[-]

aldegr@reddit

There are some edge cases with the template that were missed (go figure). What client are you using to interact with llama.cpp?

[-]

aldegr@reddit

See my comment here: https://www.reddit.com/r/LocalLLaMA/s/tVyTxz1C2E

Open WebUI hasn’t gotten the memo on how to properly handle reasoning for newer models.

[-]

TechSwag@reddit

To clarify:

I can just saving the new chat template and using --chat-template-file to use the updated one until this commit goes through, right?

[-]

aldegr@reddit

I checked against the current master, and the template will work. The PR just aligns it better to the chat template. Sorry for the confusion.

[-]

No, using this template on the current llama.cpp does not improve the model. You can either use the included interleaved template to achieve parity or wait for the PR to be merged in. Once the PR is merged, guidance is to use the updated template.

[-]

the__storm@reddit

PR 13 minutes ago

lol, I need to get off reddit

[-]

akavel@reddit

Hmm so I'm now honestly kind of confused between all those template-related changes... So, in the end, can someone please help me understand:

With the current release (b8740), can I drop any extra --chat-template-file I tried before (and haven't tested if they actually work yet), re-download the GGUFs (26b-a4b bartowski & unsloth), and it will "just work"? or not? or not yet? do the ggufs need to be updated? will they be?

Or, this is not going to work, and I need to keep trying to wrangle some variant of --chat-template-file with some incarnation of models/templates/google-gemma-31B-it-interleaved.jinja path in it?

[-]

finevelyn@reddit

If the gguf was updated with the new template, then you could drop --chat-template-file (the template is embedded inside the gguf file), but neither bartowski or unsloth seem to be updated right now.

What you need to do is to download the official updated chat_template.jinja from here https://huggingface.co/google/gemma-4-26B-A4B-it/tree/main and use it with --chat-template-file. This replaces the google-gemma-31B-it-interleaved.jinja file provided by llama.cpp.

[-]

akavel@reddit

Thank you! And seems both are now updated in the end 😄

[-]

Clean_Hyena7172@reddit

Will this fix the issues with reasoning not working?

[-]

Rektile142@reddit

Late reply, but it does. Using the fresh template now and Gemma 26B is successfully triggering and interleaving it's chain of thought each run now.

[-]

Thomasedv@reddit

Really hope this fixes my issue with Gemma stopping before it's really done working. Aside from some leaking of the template in calls, gamma will say "I'll do X now" and then just abruptly stop.

It's very obvious when swapping to another model, which seems a lot more agentic when it follows it's process. (in my case glm-4.7). Hopefully it also helps on looping issues, the edit functionality breaking and such as well!

[-]

ab2377@reddit

what you using it with? i used opencode today, exactly same problem, is like "i will now use grep to look for it", then stops, also does say "i. waiting for its reply".

[-]

Thomasedv@reddit

Claude Code, and I think I tried Qwen Chat as well.

[-]

MoffKalast@reddit

> refuses to elaborate further

> eos token

[-]

Corosus@reddit

Yeah it does this a LOT for me, falsely emitting a stop token or something, im curious if anyone has any interesting ways to work around it, i was considering making its progress constantly updated to a .md file and running opencode on a loop with 'do the thing in file'

[-]

TechSwag@reddit

Unfortunately this doesn't seem to always fix that - using the template by manually defining it in the config.

It is a lot better about both following instructions and calling tools, and is doing interleaved thinking properly now.

[-]

Borkato@reddit

Wait so do we have to redownload the models or…

I hope to god this is the final fix because I swear Gemma STILL has issues with my homegrown setup that qwen has 0 problems with

[-]

Sadman782@reddit

Why redownload the model? Just download the jinja file and use --jinja --chat-template-file

[-]

Borkato@reddit

Just in case. There are all kinds of fixes being done constantly

[-]

yrro@reddit

Hopefully the ggml-org GGUFs will be rebuilt with the new template but until then, you can run hf download google/gemma-4-26B-A4B-it chat_template.jinja and then use --chat-template-file ~/.cache/huggingface/hub/models--google--gemma-4-26B-A4B-it/snapshots/1db3cff1840c2ae59759d8e842ff37831cf8cb63/chat_template.jinja.

[-]

Voxandr@reddit

Same here. But gonan try with this fix.

[-]

RateRevolutionary370@reddit

Is it possible to get this to work with LM studio? Im copying the Jinja code into the prompt template box but the model is saying "This message contains no content. The AI has nothing to say." I'm using Gemma 4 26b A4B (Q4_K_M).

[-]

winna-zhang@reddit

nice, was hitting some weird tool call formatting issues before

did you notice it actually improves consistency or just fixes edge cases?

[-]

Sadman782@reddit

it is way better now, try

[-]

BrianJThomas@reddit

I'm having luck with 31B now, but 26B still runs into issues for me.

[-]

Sadman782@reddit

try this jinja: https://pastebin.com/raw/hnPGq0ht

[-]

BrianJThomas@reddit

Oh nice. I didn't see that the original model was updated a few hours ago. Trying again...

[-]

Kodix@reddit

So just use the --use-chat-template-file flag with this new template with the newest self-compiled llama cpp and that's all, yeah?

Probably this alone won't be enough to fix the model looping and the tool call issues/"I'll do x", but once those are fixed, this model's golden.

[-]

Sadman782@reddit

you can try: https://pastebin.com/raw/hnPGq0ht
it is much better for me

[-]

Kodix@reddit

Thank you! Where'd you get that from? Is that the interleaved template, or something else?

[-]

Sadman782@reddit

Google updated the official one a few hours ago: https://huggingface.co/google/gemma-4-26B-A4B-it/blob/main/chat_template.jinja and Gemini fixed that a bit too. It's better than the updated one, so you can try both and check which is better for you.

https://pastebin.com/raw/hnPGq0ht it works better for me.

[-]

Voxandr@reddit

gonan try.

[-]

Sadman782@reddit

it seems it still has issues, gemini fixed it a bit and it seems better now. it is properly calling multiple tools, whereas before it was ignoring some tools and descriptions completely:

https://pastebin.com/hnPGq0ht

[-]

david_0_0@reddit

the tool call improvements are critical for agentic workloads. worth noting though - if youre running inference servers with cached jinja templates, the old format might break mid-stream. did the pr maintain backward compatibility or do existing quantized versions need rebuilding? also curious if dialog compliance fixes affect instruction-following tuning, since tighter compliance sometimes reduces model creativity.

[-]

FoxiPanda@reddit

Google changed Gemma4 stuff again? I'm dying on the inside right now lol.