Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar?

Posted by boutell@reddit | LocalLLaMA | View on Reddit | 116 comments

I'm running Qwen3.6-35B-A3B-UD-Q4_K_M on an M2 Macbook Pro with 32GB of RAM. I'm using quite recent builds of llama.cpp and opencode.

To avoid llama-server crashing outright due to memory exhaustion, I have to set the context window to 32768 tokens. This turns out to be important.

As a hopefully reasonable test, I gave opencode a task that Claude Code was previously able to complete with Opus 4.7. The project isn't huge, but the task involves rooting around the front and back end of an application and figuring out a problem that did not jump out at me either (and I was the original developer, pre-AI).

The results are really tantalizing: I can see it has figured out the essentials of the bug. But before it can move on to implementation, compaction always seems to throw out way too much info.

If I disable the use of subagents, it usually survives the first compaction pass with its task somewhat intact, because I'm paying for one context, not two.

But when I get to the second compaction pass, it pretty much always loses its mind. The summary boils down to my original prompt, and it even misremembers the current working directory name (!), coming up with a variant of it that of course doesn't exist. After that it's effectively game over.

After reading a lot about how Qwen is actually better than most models with regard to RAM requirements, I've come to the conclusion that (1) 32768 is the biggest context I can get away with, and (2) it just ain't enough.

Has anyone had better results under these or very similar constraints?

Thanks!