Has anyone figured out why Claude Code running qwen locally fails when you try to /compact?
Posted by fredandlunchbox@reddit | LocalLLaMA | View on Reddit | 16 comments
I’ve tried a few suggested solutions but nothing has worked so far.
Is claude trained to respond in a particular way that qwen doesn’t know about?
I’m not sure how to debug since I can’t see the responses from either inside the harness.
PositiveBit01@reddit
Are you running locally without a subscription? I used to be able to do that but it stopped working recently
a_beautiful_rhind@reddit
These tools are shit at context management and expect huge context windows. You will have to scare one up that respects limits correctly. Roo was like that but they want to play SaaS provider now.
TheseTradition3191@reddit
/compact works by sending the full conversation to the model with a specific instruction block about how to format the output. the failure with local models is usually not context size but response format - claude was fine-tuned to return compaction output in the format claude code expects, qwen returns something structurally different, cc can't parse it and errors.
easiest debug: watch your local server logs during a /compact attempt. you'll see the exact prompt cc sends including the output format instructions. then try running just that prompt against qwen directly in your api client to see what it actually returns vs what cc expects.
timeout is also worth ruling out separately. local inference on a large context plus the compaction instruction is slow, and cc has request timeouts. if qwen starts generating but cc kills the connection before its done you'd see the same failure with a different root cause.
Destructi0@reddit
Tried recently to switch to local qwen3.6 with VRAM constraints.
My cc handled compaction well with 130k context + 50% compaction threshhold.
But it was slow as hell - maybe you dealing with timeout? it can be changed in cc afik
mister2d@reddit
it needs a 200k ctx window.
fredandlunchbox@reddit (OP)
I have a 260k context window.
SeyAssociation38@reddit
Because it was not designed for anything other than Claude. It's source code has leaked, you can use opencode instead
fredandlunchbox@reddit (OP)
No, they support it first party in the product. You can run claude code desktop with any model you want.
Altruistic_Heat_9531@reddit
context size maybe?
fredandlunchbox@reddit (OP)
Fails even if you just run a few messages and compact.
OneSlash137@reddit
Using a Ferrari with a 2 stroke motor powering it…
fredandlunchbox@reddit (OP)
It’s very capable of tasks on a smaller scale. The prompts are very powerful, even with a less powerful model.
arcanemachined@reddit
Not a Ferrari guy... do they also break down every 2 weeks?
Electrical-Shape-266@reddit
If you can’t see the prompt and output, you’re basically blind here.
merica420_69@reddit
Qwen 3.5 be like that
m94301@reddit
Is this local? You can see the query / response on most servers to help debug