PSA: Gemma 4 template improvements
Posted by FastHotEmu@reddit | LocalLLaMA | View on Reddit | 39 comments
A PR was just merged that improves tool calls and dialog compliance. Make sure to update your jinja templates for better results.

aldegr@reddit
For llama.cpp, you'll have to wait for https://github.com/ggml-org/llama.cpp/pull/21704 before using this template.
theUmo@reddit
Stars didn't align for me. Applying this template to the GGUF I had downloaded caused total model failure. It looks like the GGUFs have been updated now though.
aldegr@reddit
There are some edge cases with the template that were missed (go figure). What client are you using to interact with llama.cpp?
theUmo@reddit
It was with OpenWebUI.
aldegr@reddit
See my comment here: https://www.reddit.com/r/LocalLLaMA/s/tVyTxz1C2E
Open WebUI hasn’t gotten the memo on how to properly handle reasoning for newer models.
TechSwag@reddit
To clarify:
I can just saving the new chat template and using
--chat-template-fileto use the updated one until this commit goes through, right?aldegr@reddit
I checked against the current master, and the template will work. The PR just aligns it better to the chat template. Sorry for the confusion.
aldegr@reddit
No, using this template on the current llama.cpp does not improve the model. You can either use the included interleaved template to achieve parity or wait for the PR to be merged in. Once the PR is merged, guidance is to use the updated template.
the__storm@reddit
lol, I need to get off reddit
akavel@reddit
Hmm so I'm now honestly kind of confused between all those template-related changes... So, in the end, can someone please help me understand:
With the current release (b8740), can I drop any extra
--chat-template-fileI tried before (and haven't tested if they actually work yet), re-download the GGUFs (26b-a4b bartowski & unsloth), and it will "just work"? or not? or not yet? do the ggufs need to be updated? will they be?Or, this is not going to work, and I need to keep trying to wrangle some variant of
--chat-template-filewith some incarnation ofmodels/templates/google-gemma-31B-it-interleaved.jinjapath in it?finevelyn@reddit
If the gguf was updated with the new template, then you could drop --chat-template-file (the template is embedded inside the gguf file), but neither bartowski or unsloth seem to be updated right now.
What you need to do is to download the official updated chat_template.jinja from here https://huggingface.co/google/gemma-4-26B-A4B-it/tree/main and use it with --chat-template-file. This replaces the google-gemma-31B-it-interleaved.jinja file provided by llama.cpp.
akavel@reddit
Thank you! And seems both are now updated in the end 😄
Clean_Hyena7172@reddit
Will this fix the issues with reasoning not working?
Rektile142@reddit
Late reply, but it does. Using the fresh template now and Gemma 26B is successfully triggering and interleaving it's chain of thought each run now.
Thomasedv@reddit
Really hope this fixes my issue with Gemma stopping before it's really done working. Aside from some leaking of the template in calls, gamma will say "I'll do X now" and then just abruptly stop.
It's very obvious when swapping to another model, which seems a lot more agentic when it follows it's process. (in my case glm-4.7). Hopefully it also helps on looping issues, the edit functionality breaking and such as well!
ab2377@reddit
what you using it with? i used opencode today, exactly same problem, is like "i will now use grep to look for it", then stops, also does say "i. waiting for its reply".
Thomasedv@reddit
Claude Code, and I think I tried Qwen Chat as well.
MoffKalast@reddit
> refuses to elaborate further
> eos token
Corosus@reddit
Yeah it does this a LOT for me, falsely emitting a stop token or something, im curious if anyone has any interesting ways to work around it, i was considering making its progress constantly updated to a .md file and running opencode on a loop with 'do the thing in file'
TechSwag@reddit
Unfortunately this doesn't seem to always fix that - using the template by manually defining it in the config.
It is a lot better about both following instructions and calling tools, and is doing interleaved thinking properly now.
Borkato@reddit
Wait so do we have to redownload the models or…
I hope to god this is the final fix because I swear Gemma STILL has issues with my homegrown setup that qwen has 0 problems with
Sadman782@reddit
Why redownload the model? Just download the jinja file and use --jinja --chat-template-file
Borkato@reddit
Just in case. There are all kinds of fixes being done constantly
yrro@reddit
Hopefully the
ggml-orgGGUFs will be rebuilt with the new template but until then, you can runhf download google/gemma-4-26B-A4B-it chat_template.jinjaand then use--chat-template-file ~/.cache/huggingface/hub/models--google--gemma-4-26B-A4B-it/snapshots/1db3cff1840c2ae59759d8e842ff37831cf8cb63/chat_template.jinja.Voxandr@reddit
Same here. But gonan try with this fix.
RateRevolutionary370@reddit
Is it possible to get this to work with LM studio? Im copying the Jinja code into the prompt template box but the model is saying "This message contains no content. The AI has nothing to say." I'm using Gemma 4 26b A4B (Q4_K_M).
winna-zhang@reddit
nice, was hitting some weird tool call formatting issues before
did you notice it actually improves consistency or just fixes edge cases?
Sadman782@reddit
it is way better now, try
BrianJThomas@reddit
I'm having luck with 31B now, but 26B still runs into issues for me.
Sadman782@reddit
try this jinja: https://pastebin.com/raw/hnPGq0ht
BrianJThomas@reddit
Oh nice. I didn't see that the original model was updated a few hours ago. Trying again...
Kodix@reddit
So just use the --use-chat-template-file flag with this new template with the newest self-compiled llama cpp and that's all, yeah?
Probably this alone won't be enough to fix the model looping and the tool call issues/"I'll do x", but once those are fixed, this model's golden.
Sadman782@reddit
you can try: https://pastebin.com/raw/hnPGq0ht
it is much better for me
Kodix@reddit
Thank you! Where'd you get that from? Is that the interleaved template, or something else?
Sadman782@reddit
Google updated the official one a few hours ago: https://huggingface.co/google/gemma-4-26B-A4B-it/blob/main/chat_template.jinja and Gemini fixed that a bit too. It's better than the updated one, so you can try both and check which is better for you.
https://pastebin.com/raw/hnPGq0ht it works better for me.
Voxandr@reddit
gonan try.
Sadman782@reddit
it seems it still has issues, gemini fixed it a bit and it seems better now. it is properly calling multiple tools, whereas before it was ignoring some tools and descriptions completely:
https://pastebin.com/hnPGq0ht
david_0_0@reddit
the tool call improvements are critical for agentic workloads. worth noting though - if youre running inference servers with cached jinja templates, the old format might break mid-stream. did the pr maintain backward compatibility or do existing quantized versions need rebuilding? also curious if dialog compliance fixes affect instruction-following tuning, since tighter compliance sometimes reduces model creativity.
FoxiPanda@reddit
Google changed Gemma4 stuff again? I'm dying on the inside right now lol.