Qwen3.5/3.6 Coder?

Posted by ComplexType568@reddit | LocalLLaMA | View on Reddit | 48 comments

With practically all of LocalLlama glazing Qwen 3.5/3.6 for it's coding skills. Along with the fact that Alibaba themselves are focusing on making Qwen a reliable coding agent, does this rule out the chance for a new Qwen Coder? I wonder if they'd just focus on the vanilla Qwen models to be as capable in all areas very well, including coding, or if they'd double down and release another coder/agent variant... I think if they did, looking at how well Q3CN holds up, would probably wreck the market for a long, long while, especially if they keep that sweet 80B A3B model arch.

Or maybe they'd just release Q4 Coder. who knows at this point

[-]

Technical-Earth-3254@reddit

The dense 27B is good enough, but the 35B or a larger upcoming model? Would love that. Especially a new 80B-Class Coder MoE (but with more than A3B) would still be awesome.

[-]

knownboyofno@reddit

It would be cool if it was 80B20A or better yet if it was adjustable from 3 to 20! One can dream!

[-]

StardockEngineer@reddit

I almost don't feel it's necessary anymore. 27b is crazy

[-]

vr_fanboy@reddit

crazy indeed, im getting better results than sonnet for some tasks, today i was doing a malicious assesment for lean-ctx (https://github.com/yvgude/lean-ctx), sonnet very shallow analysis, it did not crawl the repo, inmediate "THIS BAD" for some nuanced reason. On the other hand PI+qwen 3.6 27b, 5 minutes scrapping the repo, very thorough analysis. Pasted the report to sonnet and got the classic 'oh you are completely right bla bla' .

Sonnet might be a stronger model but i think the bloat in CC is drowning it down in rot context.

[-]

DefNattyBoii@reddit

lean-ctx

What was the results for lean-ctx?

[-]

vr_fanboy@reddit

compile it yourself, dont update it without reviewing, remove everything network related if not using remote telemetry at building, dont trust the binaries, is a MITM between all your coding agents, very high risk software.

[-]

Karyo_Ten@reddit

How do you setup Pi? Seems like the archlinux of agent harnesses.

[-]

Virtamancer@reddit

Is OpenCode a good middle ground?

Opinionated decisions made for you, but probably sane defaults for most use cases, and prompt context is injected to steer the agent(s) but it’s (presumably?) concise/optimized and not 40k off the rip + the tens of thousands of tokens repeatedly dumped in by the CC harness (and, recently, by the Anthropic backend, too).

[-]

SourceCodeplz@reddit

Indeed. It is the age of the harness right now. Special harnesses for special tasks.

[-]

txgsync@reddit

I vibe with this. 27b's analytical ability is... ludicrously good. Beating gpt-oss-120b in qualitative measures. It might become my daily driver.

[-]

ComplexType568@reddit (OP)

I think as long as there's a lab that's ahead of Qwen, they will always compete to be the top. I could definitely see them trying to target TRUE (not just benchmark) Opus capability in 27-80B params.

[-]

stormy1one@reddit

Fully agree - without competition to drive innovation, I doubt we would have been given anything at all, for free.

[-]

sergeialmazov@reddit

I don’t want crazy model. I want sane and reasonable model

[-]

Raredisarray@reddit

I’d love another 80B a3b cider or all a rounder

[-]

soyalemujica@reddit

I doubt it, 27B 3.5 dense beat 80B a3b, so if we get another, will still be a better pick

[-]

Raredisarray@reddit

Does it ?? I saw some benchmarks for 27B barely beating the 35B a3b … so I was assuming another 80B a3b would be better.

[-]

the__storm@reddit

True, but I can't (effectively) run 27B, so I'll hold out hopium for another 80B-A3B.

[-]

gtrak@reddit

A better comparison is that 3.5 27b was competitive with 3.5 122b-a10b.

[-]

PrysmX@reddit

Qwen3-Coder-Next is my favorite local model for coding and agentic tasks.

[-]

stormy1one@reddit

Have you tried Qwen3.6-27B? After a few days with it, I no longer reminisce about Coder Next.

[-]

txgsync@reddit

Your experience rhymes with mine. 3.6-27B is slower, but more accurate with better tool calling.

Any idea why it's objectively better at coding? I've just started goofing with it this weekend at full precision -- 27B is incredibly sensitive to quantization, so I'm working through what layers to preserve at int8 vs. fp16 with A/B performance/quality/repetition tests right now -- but Qwen3-Coder-Next seemed really tolerant to quantization in a way that the 27B is not.

I wish Qwen had included a "layers to preserve" directive in their model card like GPT-OSS models did. Abliterating then re-quantizing GPT-OSS models was just an engineering exercise. With Qwen3.6, it's more like a murder mystery.

[-]

ea_man@reddit

I say that the 27B is the Coder, the coming (I hope) QWEN3.6 9B or 4B is gonna be the agent. If such small model is gonna be good at tools and fucking fast...

You use the big guys for planning and solving problems, keep a swarm of the quick small ones to apply.

Yet I would like a \~20B Coder for those with just 16GB, I mean something that you can run at 4_K_M with Q_8 KV on a 16GB because 27B now wants 24GB and that's not friendly.

[-]

madtopo@reddit

Speaking from a purely parameters count, 27B is not Coder because Qwen 3 Coder Next has 80B params.

[-]

txgsync@reddit

Qwen3-Coder-Next had more parameters, sure. But Qwen3.6-27B has better -- if significantly slower -- outputs.

I'm trying to wrap my head around *why*. Because I don't quite understand why and this vexes me.

[-]

social_tech_10@reddit

Qwen3.6-35B beats Qwen3.5-397B in most benchmarks, and that is a model which is 5x the size of Qwen 3 Coder, and Qwen3.6 27B is measurably smarter than 35B-A3B, because it's dense and it uses 27B parameters on every token, wher the MoE 35B only uses 3B per token. So it's a lot slower, but 27B crushes the benchmarks.

[-]

ComplexType568@reddit (OP)

Knowledge amount is still important and I believe Q3CN has that edge. I think, following 3.5's increases in param counts, Q3.5C will probably be larger, or smaller (maybe 2 variants? Like 35B and a 122B) but I'd say 60B A3B would be a sweet spot if they want to compete with DeepSeeks new 1M token attention to run on 2x 3090s

[-]

kamikamen@reddit

Does knowledge matter if you equip it with something like tavily+docs?

Like intuitively, I'd prefer a very smart model that is great at tool calling but with little world knowledge, to the opposite. Claude has great knowledge, but it states so confidently that things do or do not exist until you explicitly tell it to look it up and then it agrees with you. Something that has the minimal amount of world knowledge to hold a coherent thought and has to rely on a search tool for the rest seems more helpful/trustable than the alternative.

[-]

Karyo_Ten@reddit

Qwen3.6 is not Qwen3 and Qwen2.5-coder was 32B, it's not about size but training.

[-]

mateszhun@reddit

It would be cool to have a 80-100B A10B

[-]

NNN_Throwaway2@reddit

3.6 feels like it could have just as well have been the "coder" release. I'd be surprised if they then went and did a coder on top of that.

[-]

txgsync@reddit

Nailed it. The 27B excels at analysis. It's not quite as standoffish as Qwen2.5-coder was -- that model would essentially tell me to fuck off unless I wanted it to write code -- but not very "friendly", either

Fabulous cosmic reasoning power. But it's a terrible wannabe digital therapist LOL :)

[-]

alphatrad@reddit

Qwen Coder Next just came out in Feb 2026 - it wasn't that long ago. But certainly before 3.5 & 3.6

3.6 is pretty solid.. but still struggles with things.

The problem I have with these; is the fact that they have trained them to do tool calls and agentic stuff. But it's actual coding ability if you look is higher than Sonnet 3.7 but just a few points below Sonnet 4.

So you have to reframe how you use these to early 2025.

And reminder; Claude code came out in February 2025 with Sonnet 3.7 !!!

The problem is; a lot of us are trying to work with these models like the TODAY frontier models because they can do all the same tool calling and AGENT stuff.

But they actually have last years intelligence.

But, that's still HUGE when you think about it.

A model that runs on consumer hardware is coding as good as Claude Code when it came out.

So... will they make another coder?

Maybe... but maybe not. It depends where they are aiming.

It seems in the past couple of months with Agents, people are moving away from just general chat in a webui.

Which means what the model can do has to evolve somewhat.

And I have a hunch they are trying to follow Anthropic.

Make a local model good at doing stuff on the desktop. Good at being an Agent.

¯_(ツ)_/¯ could be totally off base here and totally stupid.

[-]

gtrak@reddit

How are you measuring intelligence here?

[-]

alphatrad@reddit

Using the EVALS they post and mostly focused on SWE bench becuase i code. Sonnet 3.7 was a 72% and QWEN scored 75% - my logic is sound. But reading comp is hard. They have had a whole year to improve and learn from the big models versus last year on AGENTIC stuff, which I've indicated.

https://qwen.ai/blog?id=qwen3.6-27b

https://www.anthropic.com/news/claude-3-7-sonnet

https://www.anthropic.com/news/claude-4?c=6709

[-]

Mr_Moonsilver@reddit

How do you determine that Qwen 3.6 27b is worse than Sonnet 4?

[-]

alphatrad@reddit

I was going off their own evals. SWE bench.

For my work it's scoring closer to sonnet 4.5 which honestly I haven't enjoyed the recent tweaks or whatever the hell they've been doing to Claude since February.

[-]

Mr_Moonsilver@reddit

Qwen 3.6 27b scores more than 4 pts higher than Sonnet 4, it's one the level of Codex 5.1 high.

[-]

vr_fanboy@reddit

I have been AI coding since chatgpt3.5, sonnet 3.7 had TON-s of issues and no toolcall / mcp / skill system access. Today qwen 3.6 + coding harness is no where near that primitive. I remember bitching because it changed THE WHOLE FILE for a couple of edits (all models did this back then) even inside cursor. Im asking the same stuff to sonnet and it is givingme better takes sometimes. The major issue with local llms is the througput capability, i could not replace multiple CC instances with a 3090. And of course if you need a very complex planning / implementation / research you should use a big fat frontier model.

[-]

ComplexType568@reddit (OP)

Off topic but I think Qwen, if following Anthropic, would want to compete with Cowork, because Pi is basically the new kid on the block and is pretty hard to compete with.

[-]

gtrak@reddit

There are some interesting coding fine-tunes on huggingface for 3.5 and I expect to see that again.

[-]

ComplexType568@reddit (OP)

I don't think anybody does it as well as Qwen does when it comes to fine-tuning their own models

[-]

gtrak@reddit

If I find one that doesn't lose on benchmarks, but is more specific to my use cases and preferences, that's better.

[-]

AppealSame4367@reddit

I don't understand what would be different for a "coder" model?

You can already disable thinking and the thinking in pi cli is already sparse. So what would a coder version do different or better?

[-]

Lesser-than@reddit

technically I think we got 3.5 coder first with qwen3-coder-next, we might get blessed with another remix at some point but I am not holding my breath.

[-]

ComplexType568@reddit (OP)

Yeah, I'd say the "next" in Q3CN basically was like a 3.5 beta, because most architectural implementations for 3.5 were almost complete by Next. Hope it gets a reboot though

[-]

FullstackSensei@reddit

What would the coder model bring exactly?

I think the past coder models were tests while the Qwen team figured how to build their coding training pipeline, and 3.6 is the fruition of that. There's no reason to train/tune a coding specific model if the coding pipeline is part of the base model training and not "just" a fine tune.

[-]

ComplexType568@reddit (OP)

We don't really know what we're missing until we see it ourselves. Who knows what Q3.6C could bring? While I do understand that 3.6 was meant to be mainly an agent, it still says it's focused on other environments like being a general chatbot, etc. etc. The coder variants specifically target coding/agentic tasks, and with a lab like Qwen, who knows how far that could pull it into.

[-]

EggDroppedSoup@reddit

i think a coder model doesnt really appeal anymore unless its larger than 80b since enthusiasts have the tech. also that a modek that doesnt only train on coding preforms better in irl scenarios