Qwen3.6 merged chat template from allanchan339 and froggeric

Posted by fakezeta@reddit | LocalLLaMA | View on Reddit | 34 comments

Hi,

recently froggeric and allanchan339 released enhanced/fixed template for Qwen3.6 each one addressing different topics.
I didn't know which one to use so I merged both with the help of Claude Opus to have the best of both.

I've uploaded it to this gist
https://gist.github.com/fakezeta/9e8e039c60332fcb143c6e805558afe0

Here a summary table done with Opus

Feature	allanchan339	froggeric	Merged
Long strict tool rules + follow-up example	✅	❌	✅
`developer` role accepted	❌	✅	✅
`<\|think_off\|>` / `<\|think_on\|>` toggles	❌	✅	✅
Historical reasoning hidden by default	✅	❌	✅
String tool args parsed as JSON into `<parameter>` blocks	✅	❌	✅
Non-ASCII in JSON escaped (`uXXXX`)	❌	✅	✅
`</thinking>` recognized (not just `</think>`)	❌	✅	✅
Auto-close unclosed `<think>` before `<tool_call>`	✅	❌	✅
Vision + tool_response structure	same	same	same

I've tested with llama-server and Qwen3.6 35B A3B

Hope you like it.
If there is anything good the praise it for froggeric and allanchan339.

Any blame instead is for me but please be kind 😄

[-]

ex-arman68@reddit

Thanks for your work. I have checked your merged template and allanchan339. Here are my thoughts:

1. Long strict tool rules: allanchan339 uses a much longer version (300 tokens). It can be useful for a specific case of agentic tool calling, but consumes more token. My version is not quite as thorough but works fine in most cases. Verdict: use mine.

2. Historical reasoning hidden by default: we use a different approach. Mine uses a more nuanced condition, respecting enable_thinking, allanchan339 does not. Verdict: use mine.

3. String tool args pased as json: when the tool call argument is a json string, my version just dumps the raw string. allanchan339 parses it and renders each key as a separate parameter block. Verdict: use allanchan339.

4. Autoclose unclosed think before tool call: allanchan339 add logic to detect when a think block is open but not closed before a tool call happens, and injects the cdlosing tag automatically. This can preven malformed output in edge cases. Verdict: use allanchan339.

I will be updating my template with the changes from allanchan339

[-]

ex-arman68@reddit

I have now finished doing a more thorough analysis, and porting what was needed in my template. The problem is some of the fixes in allanchan339 template are specific to vLLM and will break other tools; for example the string tool args passed as json. It was the same with the original template from Qwen: it includes many settings specific to vLLM and not compatible with other tools; this is why community templates are needed.

The autoclose unclosed think before tool call is a good addition, is universally compatible, and I have now ported it into my updated Qwen 3.5 and Qwen 3.5 chat templates, which are universally compatible with all tools:

I see, thanks.

I seem to be getting errors with this new template, unfortunately.

Is this new template for coder or xml?

[-]

fakezeta@reddit (OP)

The template is for the Coder-style tool-call format. I think the proper option with SGLang should be: --tool-call-parser qwen3_coder

[-]

DarkGhostHunter@reddit

First prompt to Zed Agent with the template and got this:

Error rendering prompt with jinja template: "Unknown test: sequence".

This is usually an issue with the model's prompt template. If you are using a popular model, you can try to search the model under lmstudio-community, which will have fixed prompt templates. If you cannot find one, you are welcome to post this issue to our discord or issue tracker on GitHub. Alternatively, if you know how to write jinja templates, you can override the prompt template in My Models > model settings > Prompt Template.

[-]

fakezeta@reddit (OP)

Thank you for reporting. It seems that llama.cpp Jinja implementation (Minja) does not support the is sequence test. I uploaded a v2 using is iterable that is supported by Minja and practically means the same thing here since I already exclude strings and mappings.

[-]

thaatz@reddit

Thanks for sharing!

should i use preserve thinking by adding `{%- set preserve_thinking = true %` to the top of the template? does it play well?

[-]

dtdisapointingresult@reddit

I haven't tested it, but...idk, it depends.

Pro: Keeps historical reasoning, allowing the agent to better understand the reasoning for doing certain things in previous steps. Counter: token usage increases even for problems you solved previously in the session and no longer care about, and now this hurts all current and future tasks.
Pro: better for prompt cache due to not deleting stuff from previous turns. Counter: when it's turned off, only the previous turn's reasoning is deleted, so you are only recomputing Previous + Current message, not the whole history, this isn't a big deal.

I wouldn't hard-code this into the chat template, I'd use the chat-template-kwargs flag like Qwen recommend to turn it off/on in the launcher.

[-]

fakezeta@reddit (OP)

same way as the original templates. One way adding {%- set preserve_thinking = true %` to the top of the template.
Other ways is starting vLLM with option:

--default-chat-template-kwargs '{"preserve_thinking": true}'

or for llama-server:

--chat-template-kwargs '{"preserve_thinking":true}'

escape '{" according to your shell/OS

[-]

buttplugs4life4me@reddit

If you use llama-swap in PowerShell, its a little unintuitive with {\"preserve_thinking\" : true} I. E. Dont escape the curly braces

[-]

DuranteA@reddit

I was quite skeptical, but in ~1 hour of testing so far this has substantially reduced instances of "Invalid API Response" in my use case (working with Cline on a medium-sized C++ code base with a few additional MCP servers and tools).

If you suffer from malformed responses (especially unclosed tags) then do give this a try. Thanks OP for sharing!

[-]

noclip1@reddit

Would love for someone to explain to me how a chat template can be community modified to possibly out perform (or fix bugs) in the intended chat template the Qwen team released and would've been using in training and their own inference testing?

[-]

tempedbyfate@reddit

I think there are probably thousands of more people in the community who are tinkering with these bleeding edge models vs the Qwen team. Not everything produced by the community will necessarily be better though.

[-]

tednoob@reddit

What goes into the model is a linear token stream and these templates moves around data to best fit what the model expects. Like when a model understands both the openai or anthropic endpoint formats it is not necessarily because it was taught to recognise it, these templates map the data to be structured similar to how the model was trained. A concrete example is how thinking and output are separated into different flows, but it's all just one flow underneath. Open models are often research first, and they don't think that deeply about how the model will fit in e.g llama.cpp serving an openai api.

[-]

ambient_temp_xeno@reddit

If it wasn't for the 'used claude opus' part lowering the odds, there would be a 50/50 chance of this being an improvement, historically speaking.

The explanation vaguely summed up: 'life on the frontier'.

[-]

fakezeta@reddit (OP)

preserve thinking flag works. At least in my testing 😛

[-]

ps5cfw@reddit

Well then I Will give It a try, froggeric template seems to work Better for me than the unsloth template BUT I am seeing a lot of cache invalidations with opencode.