Qwen3.6 is maintaining context inside the CoT
Posted by Big_Mix_4044@reddit | LocalLLaMA | View on Reddit | 20 comments
I tested it in several iterations, and although it's sometimes hard to make the model stick to the number, it reliably remembered the number when it was chosen during reasoning. You have to add --chat-template-kwargs '{"preserve_thinking": true}' for this to actually work.
seppe0815@reddit
nothing beat the new gemma 4 llms ... talk end !
Far-Low-4705@reddit
I’m not super sure what the purpose of this feature is.
The main context is in the final output, rarely is the content of the reasoning critical like in the above example.
Also it just consumes far more of the context window, which reduces performance and speeds up context rot
Borkato@reddit
Agentic tasks that require remembering why the model didn’t do x or y instead of z, so that it doesn’t try x or y if z fails
Big_Mix_4044@reddit (OP)
Reasoning context counts either way.
TheCTRL@reddit
Confirm. For lmstudio edit prompt template jinja and add on top: {%- set preserve_thinking = true %}
TheCTRL@reddit
If you update now to 0.4.12-1 you'll find a specific option for that
nicholas_the_furious@reddit
I just updated but don't see the option. Where can one find it? I don't see it in the Sampling section of the Inference tab.
swiss_aspie@reddit
Any.idea why this would not be the default?
No-Refrigerator-1672@reddit
It increases KV cache usage significantly, which mqy be problematic on midrange consumer hardware with limited VRAM.
qado@reddit
Thanks !
robertpro01@reddit
Can someone explain how is making this model better (or worse)?
Genuinely asking.
MaxKruse96@reddit
If they say "do this" and then make it optional in the chat tempalte... qwen wtf are you doing
GirthusThiccus@reddit
I suggest less drugs and more sleep.
Big_Mix_4044@reddit (OP)
I suppose it can slow down the pp. It's a good to have feature, but not really necessary.
Spirited-Toe-3988@reddit
Interesting — does this hold under longer contexts (e.g. 32K+), or is it mostly visible in shorter interactions? Curious how stable it is when the prompt gets large.
SimilarWarthog8393@reddit
I just did a similar experiment with a word guessing game but the model hallucinated the word it chose during CoT, wondering if it's the GUI not passing the reasoning content ?
z_3454_pfk@reddit
you need preserve_thinking as true
SimilarWarthog8393@reddit
Yes ofc I passed the chat template kwarg haha, that's why I was confused why the model still didnt retain the thinking content
Electronic-Metal2391@reddit
Like taking a chocolate from a baby!
jingtianli@reddit
Yeah this only works with preserve_thinking=true
Otherwise LLM will pick new number everytimes