Qwen 3.6: worse adherence?

Posted by tkon3@reddit | LocalLLaMA | View on Reddit | 42 comments

Just swapped Qwen 3.5 for the 3.6 variant (FP8, RTX 6000 Pro) using the same recommended generation settings. My stack is vLLM (v0.19.0) + Open WebUI (v0.8.12) in a RAG setup where the model has access to several document retrieval tools.

​After some initial testing (single-turn, didnt try to disable interleaved reasoning yet), I’ve noticed some significant shifts:

- ​3.6 is far more "talkative" with tools. Reasoning tokens have jumped from a few dozen to several hundred (a 2x–3x increase).

- ​It struggles to follow specific instructions compared to 3.5.

​- It seems to ignore or weight the system prompt much less.

​- Despite being prompted for exhaustive answers, the final responses are significantly shorter.

​I suspect a potential issue with the chat template or how vLLM handles the new weights, even though the architecture is the same. Anyone else seeing similar problems?