Qwen3.6-35B - Terrible instruction following when using context files (with vanilla pi-agent). Model issue or am I doing something wrong?

Posted by FusionX@reddit | LocalLLaMA | View on Reddit | 7 comments

First of all, I am really impressed with Qwen 35B's first class agentic behaviour and tool calling support. I've been exploring it for general tasks where I prompt the model to research and analyze using tool calls and scripts. And it has exceeded my expectations. Until now..

During some of the runs, I noticed few common mistakes that kept cropping up, due to the nature of the task itself. Nothing that an AGENTS.md couldn't fix. So, I added a couple of (3-4) simple instructions to address them. Here is when things go wrong. The model completely IGNORES these prior instructions, unless I explicitly remind it during the actual chat. (Yes, the context file is pre-filled, I confirmed that)

Example: - Agents.md instruction: Never read a file directly into context window without knowing its size. A large file might overload the context window. - User prompt: explore list.txt and analyze. - Result: It tries to directly read list.txt without bothering to check the size..

Am I doing something wrong? I'm really betting on this being a configuration issue because the model had been otherwise exceeding my exepectations. I tried a lot of things, from changing quants to removing llama.cpp params to find the culprit but nothing helped so far.

Setup:

bartowski's Qwen3.6-35B-Q5_K_L with officially recommended sampling parameters for general tasks (tried coding params too, same result), and latest llama.cpp build on linux with CUDA 13.2

llama-server --model models/bartowski/Qwen_Qwen3.6-35B-A3B-GGUF/Qwen_Qwen3.6-35B-A3B-Q5_K_L.gguf -fitt 128 -fa on --jinja --no-mmap --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --presence-penalty 0.0 --repeat-penalty 1.0 --chat-template-kwargs '{"preserve_thinking": true}' -ctk q8_0 -ctv q8_0 -c 128000

Using it with (latest) vanilla pi coding agent.