Qwen3.6 is maintaining context inside the CoT | TheaterFire

Qwen3.6 is maintaining context inside the CoT

Posted by Big_Mix_4044@reddit | LocalLLaMA | View on Reddit | 20 comments

Qwen3.6 is maintaining context inside the CoT

I tested it in several iterations, and although it's sometimes hard to make the model stick to the number, it reliably remembered the number when it was chosen during reasoning. You have to add --chat-template-kwargs '{"preserve_thinking": true}' for this to actually work.

[-]

seppe0815@reddit

nothing beat the new gemma 4 llms ... talk end !

[-]

Far-Low-4705@reddit

I’m not super sure what the purpose of this feature is.

The main context is in the final output, rarely is the content of the reasoning critical like in the above example.

Also it just consumes far more of the context window, which reduces performance and speeds up context rot

[-]

Borkato@reddit

Agentic tasks that require remembering why the model didn’t do x or y instead of z, so that it doesn’t try x or y if z fails

[-]

Big_Mix_4044@reddit (OP)

Reasoning context counts either way.

[-]

TheCTRL@reddit

Confirm. For lmstudio edit prompt template jinja and add on top: {%- set preserve_thinking = true %}

[-]

TheCTRL@reddit

If you update now to 0.4.12-1 you'll find a specific option for that

[-]

nicholas_the_furious@reddit

I just updated but don't see the option. Where can one find it? I don't see it in the Sampling section of the Inference tab.

[-]

swiss_aspie@reddit

Any.idea why this would not be the default?

[-]

No-Refrigerator-1672@reddit

It increases KV cache usage significantly, which mqy be problematic on midrange consumer hardware with limited VRAM.

[-]

qado@reddit

Thanks !

[-]

robertpro01@reddit

Can someone explain how is making this model better (or worse)?

Genuinely asking.

[-]

MaxKruse96@reddit

If they say "do this" and then make it optional in the chat tempalte... qwen wtf are you doing

[-]

GirthusThiccus@reddit

I suggest less drugs and more sleep.

[-]

Big_Mix_4044@reddit (OP)

I suppose it can slow down the pp. It's a good to have feature, but not really necessary.

[-]

Spirited-Toe-3988@reddit

Interesting — does this hold under longer contexts (e.g. 32K+), or is it mostly visible in shorter interactions? Curious how stable it is when the prompt gets large.

[-]

SimilarWarthog8393@reddit

I just did a similar experiment with a word guessing game but the model hallucinated the word it chose during CoT, wondering if it's the GUI not passing the reasoning content ?

[-]

z_3454_pfk@reddit

you need preserve_thinking as true

[-]

SimilarWarthog8393@reddit

Yes ofc I passed the chat template kwarg haha, that's why I was confused why the model still didnt retain the thinking content

[-]

Electronic-Metal2391@reddit

Like taking a chocolate from a baby!

[-]

jingtianli@reddit

Yeah this only works with preserve_thinking=true

Otherwise LLM will pick new number everytimes