Has anyone figured out why Claude Code running qwen locally fails when you try to /compact?

Posted by fredandlunchbox@reddit | LocalLLaMA | View on Reddit | 16 comments

I’ve tried a few suggested solutions but nothing has worked so far.

Is claude trained to respond in a particular way that qwen doesn’t know about?

I’m not sure how to debug since I can’t see the responses from either inside the harness.

[-]

PositiveBit01@reddit

Are you running locally without a subscription? I used to be able to do that but it stopped working recently

[-]

a_beautiful_rhind@reddit

These tools are shit at context management and expect huge context windows. You will have to scare one up that respects limits correctly. Roo was like that but they want to play SaaS provider now.

[-]

/compact works by sending the full conversation to the model with a specific instruction block about how to format the output. the failure with local models is usually not context size but response format - claude was fine-tuned to return compaction output in the format claude code expects, qwen returns something structurally different, cc can't parse it and errors.

easiest debug: watch your local server logs during a /compact attempt. you'll see the exact prompt cc sends including the output format instructions. then try running just that prompt against qwen directly in your api client to see what it actually returns vs what cc expects.

timeout is also worth ruling out separately. local inference on a large context plus the compaction instruction is slow, and cc has request timeouts. if qwen starts generating but cc kills the connection before its done you'd see the same failure with a different root cause.

[-]

Destructi0@reddit

Tried recently to switch to local qwen3.6 with VRAM constraints.
My cc handled compaction well with 130k context + 50% compaction threshhold.
But it was slow as hell - maybe you dealing with timeout? it can be changed in cc afik

[-]