Qwen3.6 merged chat template from allanchan339 and froggeric
Posted by fakezeta@reddit | LocalLLaMA | View on Reddit | 34 comments
Hi,
recently froggeric and allanchan339 released enhanced/fixed template for Qwen3.6 each one addressing different topics.
I didn't know which one to use so I merged both with the help of Claude Opus to have the best of both.
I've uploaded it to this gist
https://gist.github.com/fakezeta/9e8e039c60332fcb143c6e805558afe0
Here a summary table done with Opus
| Feature | allanchan339 | froggeric | Merged |
|---|---|---|---|
| Long strict tool rules + follow-up example | ✅ | ❌ | ✅ |
developer role accepted |
❌ | ✅ | ✅ |
<|think_off|> / <|think_on|> toggles |
❌ | ✅ | ✅ |
| Historical reasoning hidden by default | ✅ | ❌ | ✅ |
String tool args parsed as JSON into <parameter> blocks |
✅ | ❌ | ✅ |
Non-ASCII in JSON escaped (uXXXX) |
❌ | ✅ | ✅ |
</thinking> recognized (not just </think>) |
❌ | ✅ | ✅ |
Auto-close unclosed <think> before <tool_call> |
✅ | ❌ | ✅ |
| Vision + tool_response structure | same | same | same |
I've tested with llama-server and Qwen3.6 35B A3B
Hope you like it.
If there is anything good the praise it for froggeric and allanchan339.
Any blame instead is for me but please be kind 😄
ex-arman68@reddit
Thanks for your work. I have checked your merged template and allanchan339. Here are my thoughts:
1. Long strict tool rules: allanchan339 uses a much longer version (300 tokens). It can be useful for a specific case of agentic tool calling, but consumes more token. My version is not quite as thorough but works fine in most cases. Verdict: use mine.
2. Historical reasoning hidden by default: we use a different approach. Mine uses a more nuanced condition, respecting enable_thinking, allanchan339 does not. Verdict: use mine.
3. String tool args pased as json: when the tool call argument is a json string, my version just dumps the raw string. allanchan339 parses it and renders each key as a separate parameter block. Verdict: use allanchan339.
4. Autoclose unclosed think before tool call: allanchan339 add logic to detect when a think block is open but not closed before a tool call happens, and injects the cdlosing tag automatically. This can preven malformed output in edge cases. Verdict: use allanchan339.
I will be updating my template with the changes from allanchan339
ex-arman68@reddit
I have now finished doing a more thorough analysis, and porting what was needed in my template. The problem is some of the fixes in allanchan339 template are specific to vLLM and will break other tools; for example the string tool args passed as json. It was the same with the original template from Qwen: it includes many settings specific to vLLM and not compatible with other tools; this is why community templates are needed.
The autoclose unclosed think before tool call is a good addition, is universally compatible, and I have now ported it into my updated Qwen 3.5 and Qwen 3.5 chat templates, which are universally compatible with all tools:
https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates
fakezeta@reddit (OP)
It’s an honor that you took the time to look at my work. Thank you so much for everything you’ve done and will do.
llitz@reddit
Thanks, for the publishing your version it has fixed a few edge cases.
Ok_Technology_5962@reddit
Wow! Amazing. This fixed Qwen 3.6 for me... And now im stealing this and updated Minimax m2.7 templates posting to hugging face Hunterx/MinimaxM2.7FixedTemplate https://huggingface.co/Hunterx/MinimaxM2.7FixedTemplate
zkkzkk32312@reddit
Question, is this template meant to be use with a spsific harness? Like qwen cli? Or can it be used with opencode/pi/copilot/cline as well?
fakezeta@reddit (OP)
Can be used with any client even with Open WebUI with tool calls.
Varmez@reddit
How do I use this with oMLX?
fakezeta@reddit (OP)
Never used oMLX but looking at mlx-community HF Qwen3.6 repos ther is a
chat_template.jinjafile that should be overwritten.If someone has more experience is welcome to integrate.
jinnyjuice@reddit
After replacing the chat template Jinja file, do I need to re-run SGLang/vLLM, or does it hot-load?
fakezeta@reddit (OP)
according to my knowledge both vLLM and SGLang read the template only at process startup so it need reload/restart
jinnyjuice@reddit
I see, thanks.
I seem to be getting errors with this new template, unfortunately.
Is this new template for
coderorxml?fakezeta@reddit (OP)
The template is for the Coder-style tool-call format. I think the proper option with SGLang should be:
--tool-call-parser qwen3_coderDarkGhostHunter@reddit
First prompt to Zed Agent with the template and got this:
fakezeta@reddit (OP)
Thank you for reporting. It seems that llama.cpp Jinja implementation (Minja) does not support the
is sequencetest. I uploaded a v2 usingis iterablethat is supported by Minja and practically means the same thing here since I already exclude strings and mappings.thaatz@reddit
Thanks for sharing!
should i use preserve thinking by adding `
{%- set preserve_thinking = true %` to the top of the template? does it play well?dtdisapointingresult@reddit
I haven't tested it, but...idk, it depends.
I wouldn't hard-code this into the chat template, I'd use the chat-template-kwargs flag like Qwen recommend to turn it off/on in the launcher.
fakezeta@reddit (OP)
same way as the original templates. One way adding
{%- set preserve_thinking = true %` to the top of the template.Other ways is starting vLLM with option:
or for llama-server:
escape '{" according to your shell/OS
buttplugs4life4me@reddit
If you use llama-swap in PowerShell, its a little unintuitive with
{\"preserve_thinking\" : true}I. E. Dont escape the curly bracesDuranteA@reddit
I was quite skeptical, but in ~1 hour of testing so far this has substantially reduced instances of "Invalid API Response" in my use case (working with Cline on a medium-sized C++ code base with a few additional MCP servers and tools).
If you suffer from malformed responses (especially unclosed tags) then do give this a try. Thanks OP for sharing!
noclip1@reddit
Would love for someone to explain to me how a chat template can be community modified to possibly out perform (or fix bugs) in the intended chat template the Qwen team released and would've been using in training and their own inference testing?
tempedbyfate@reddit
I think there are probably thousands of more people in the community who are tinkering with these bleeding edge models vs the Qwen team. Not everything produced by the community will necessarily be better though.
tednoob@reddit
What goes into the model is a linear token stream and these templates moves around data to best fit what the model expects. Like when a model understands both the openai or anthropic endpoint formats it is not necessarily because it was taught to recognise it, these templates map the data to be structured similar to how the model was trained. A concrete example is how thinking and output are separated into different flows, but it's all just one flow underneath. Open models are often research first, and they don't think that deeply about how the model will fit in e.g llama.cpp serving an openai api.
ambient_temp_xeno@reddit
If it wasn't for the 'used claude opus' part lowering the odds, there would be a 50/50 chance of this being an improvement, historically speaking.
The explanation vaguely summed up: 'life on the frontier'.
shockwaverc13@reddit
well gemma 4's template had to be changed multiple times despite the original being the "intended" one
Helicopter-Mission@reddit
I mean, that’s how many projects progress, taking or forking from external contributions
desmin88@reddit
That’s the neat part, it doesn’t.
kr_tech@reddit
After replacing the chat template Jinja file, do I have to re-run SGLang/vLLM? Or does it hot-load?
Due-Opportunity6212@reddit
W.
Dany0@reddit
preserve thinking off destroys performance IME. same or worse than q3.5 27b
ps5cfw@reddit
Yeah that's a big no no, not sure I'm willing to try this chat template since I'm not sure if the preserve_thinking flag works or not
Dany0@reddit
hey sorry I misunderstood op's post and the template. I got confused because I thought OP's templaye was allanchan's which did turn off thinking. I expected the first column to refer to this template
he didn't force preserve thinking off, I modified my original comment
fakezeta@reddit (OP)
preserve thinking flag works. At least in my testing 😛
ps5cfw@reddit
Well then I Will give It a try, froggeric template seems to work Better for me than the unsloth template BUT I am seeing a lot of cache invalidations with opencode.