Gemma 4 beats Qwen 3.5 (UPDATE), and Qwen 3.6 27B + MiniMax M2.7 is the best OpenCode setup

Posted by maxwell321@reddit | LocalLLaMA | View on Reddit | 17 comments

Hi all! I recently made a post about how Gemma 4 managed to replace Qwen 3.5 for me, for semantic routing and a lot of coding stuff and ultimately it was my new daily driver.

The next day, Qwen 3.6 released and I've been using it a lot this week. Here's my ultimate comparison:

Gemma 4 E4B > Qwen3.5 4B for routing and other classification tasks, I think it might be better at English understanding but might not have super technical smarts like coding

Qwen 3.6 30B & 27B > Gemma 4 26B and 31B (both)> Qwen 3.5 30B & 27B

Specifically, my light/fast model went through the following changes

Qwen 3.5 30B --> Gemma 4 26B -> Qwen 3.6 30B

Gemma 4 26B also temporarily replaced my use for Qwen 3.5 27B (dense), until 3.6 came out (now I use them interchangeably)

The only Gemma model I use now is E4B for semantic routing.

NOW, here's a new breakthrough:

I recently downloaded weights to MiniMax M2.7 MXFP4 and used it to replace Qwen 3.5 122B Q8 and Qwen3.5 397B Q2. It's the perfect middle ground and I haven't had any issues.

I'm trying to break away from my Claude Code Pro subscription, I normally use Sonnet 4.7 for all of my projects (never bother with Opus as it burns up my usage) and I rarely touch Haiku unless it's a stupid easy task.

This morning I installed OpenCode and set up my llama-swap server to swap between Qwen 3.6 30B, and Minimax M2.7 (with the GGML unified memory trick) and it's been AMAZING and I'm going to continue testing further. You do need to handhold it a bit, but it's been giving great results.

I haven't set up any agents yet, I've just been manually switching between the models but I've found that Qwen 3.6 30B is great for the planning mode, and have MiniMax M2.7 lay all the groundwork. Then back to Qwen 3.6 30B for edits.

I'm using the Q_8 unsloth quant of Qwen 3.6 30B and I have yet to have it give me any tool/command issues whatsoever through open code. MiniMax M2.7 tried to manually tell me what to do until I gently reminded it that it had the power to do it itself. Whatever tuning happened between 3.5 and 3.6 seemed to really make it do better with tool calling and knowing when to use tools.

It's a very good day to code with open source models! 2-3 years ago I remember struggling to replace ChatGPT with CodeLlama 34B, the amount of progress we've made is amazing.

Any questions lmk!

2x RTX 3090 + 1 P40 and 128GB of DDR4

[-]

JuniorDeveloper73@reddit

qwen3.6 27B - the rest

AcrobaticChain1846@reddit

Idk man gemma 4 seemed like the laziest or requires better prompting like very detailed prompting for simple tasks.

Cool-Chemical-5629@reddit

Don't expect Xbox experience on Atari hardware.

My atari hardware can run qwen3.5 9b and qwen3.6-35b-a3b just fine and what I compared gemma 4 e4b is against qwen3.5 9b and gemma 4 just sucks

Howard_banister@reddit

Its rather an Opencodes problem

billy_booboo@reddit

I've converged on the same conclusion. I really like the Gemma 4 personality and flow, and have been combining e4b with qwen 3.6 35b as a pre and post-processor.

jacek2023@reddit

OpenCode has serious problems at least with local models, you will notice sometimes a very long prompt processing without any reason, it's because it's fucked up. I tried roo code, mistral vibe and pi recently. They work much faster (I mean in real work - for many hours).

relmny@reddit

I haven't tried anyone yet, but I'm curious about Aider?

Just_Maintenance@reddit

Which one you like most? I've been using opencode and it works pretty well, ignoring the constant prompt processing.

I must work longer with pi, but initially it's much better than opencode. It's much faster and skills work correctly

xAragon_@reddit

The base prompts of OpenCode are pretty terrible. I overidden most of them with custom prompts.

Truth-Does-Not-Exist@reddit

Pi > Opencode

PhilippeEiffel@reddit

Multiple times in your message you say "Qwen 3.6 30B"

Do you mean 27B or 35B?

maxwell321@reddit (OP)

Ah yes, sorry 35B. I wrote this before going to bed and was pretty tired

mp3m4k3r@reddit

Or maybe just average them and call it 31B /s

Creepy-Bell-4527@reddit

For me personally Gemma 4 has been an absolute godsend.

Previously I felt like I had to compromise between tool calling, instruction following, and creative writing in an agentic content planner, but Gemma 4 has the creative writing prowess of Gemma 3, with tool calling accuracy and consistency, and instruction following, that frankly rivals SOTA models.

Even the e4b models, which are less effective at creative writing, absolutely kill it at tool calling and instruction following.

DefNattyBoii@reddit

Great setup! I assume under 30B you mean the 35B MoE model. Do you mind sharing your config? Just ask Qwen to generate a mermaid diagram because I'm not sure I get how you use Qwen 4B for routing lol.

btw, what is "gglm unified memory trick" how does this work? When I swap models with llama-server, my kvcache is discarded, + loading a new model takes quite some time even without warmup if I have to use system RAM.