Is qwen3 coder next still relevant with qwen3.5 release for agentic coding?

Posted by ROS_SDN@reddit | LocalLLaMA | View on Reddit | 30 comments

Basically the title. I know it will depend on your quant, but with 48gb of vram inbound, I'm curious on the communities opinion before I get the chance to vibe check.

I see a lot of people saying 35b / 27b is better, and curious on what are more focused discussion on this brings matter.

[-]

PromptInjection_@reddit

At least for me it is not relevant any longer.
Qwen 3.5 is clearly superior.

[-]

Far-Low-4705@reddit

For me it is either qwen 3.5 35b at Q8 or qwen 3 coder next 80b at Q4

Both run at 50 T/s, but I have no idea which is better.

Rn I’m leaning towards 3.5, it has (toggle-able) reasoning which is a big plus imo, and vision. Only downside is it’s 2x smaller.

[-]

ROS_SDN@reddit (OP)

How do you toggle the reasoning without a llama server reset? Any idea?

[-]

Far-Low-4705@reddit

this is what i have for llama-swap for 27b

"qwen3.5-27b-thinking:Q4_0":
    cmd: |
      ${llama-server-cmd}
      -hf unsloth/Qwen3.5-27B-GGUF:Q4_0
      -ngl 999
      -c 0
    ttl: *unload-time
    env:
      - "MTMD_BACKEND_DEVICE=ROCm1"

    filters:
      setParams:
        temperature: 1.0
        top_p: 0.95
        top_k: 20
        min_p: 0.00
        chat_template_kwargs:
          enable_thinking: true

      setParamsByID:
        "qwen3.5-27b-instruct:Q4_0":
          temperature: 0.7
          top_p: 0.8
          top_k: 20
          min_p: 0.00
          chat_template_kwargs:
            enable_thinking: false

idk if u use llama-swap, but it lets u swap models easily and define a config file for swapping models

basicially route "thinking" prompts to the `qwen3.5-27b-thinking:Q4_0` and instruct to the `qwen3.5-27b-instruct:Q4_0` id.

Also you can use --reasoning on/off flag for llama-server instead of setting chat-template-kwargs

[-]

ionizing@reddit

Agreed. I find 122B superior to Coder Next in my framework, so much so that I simply deleted the model rather than keep it around for potential use.

[-]

RedParaglider@reddit

You must be one of those mythical 256bros :)

[-]

mr_zerolith@reddit

Senior developer here.

I don't find 122b to be particularly good, GPT OSS 120b is much faster and equivalent or a little better in quality
Qwen Next Coder was super unimpressive.

[-]

JsThiago5@reddit

there is a gpt oss puzzle that is 88b and seems to have the same performance as 120b

[-]

my_name_isnt_clever@reddit

What? I swapped out GPT-OSS for Qwen 3.5 122b and it kicks it's ass at everything I've tried. It's not even close on tool call reliability.

[-]

mr_zerolith@reddit

ah, my assessment doesn't include tool usage at all, just code generation from a prompt

[-]

my_name_isnt_clever@reddit

Well yeah, it's double the active parameters so it was always going to be slower. I'm not surprised you prefer gpt-oss for direct chats, but qwen 3.5 is an agentic workhorse.

[-]

mr_zerolith@reddit

Good to know.
I dig Seed OSS 197B because i get \~20% better speed than Qwen3.5 133b and it seems good with agentic and is awesome at coding.

For coding purposes i just couldn't tell the difference between the two \~120b models

[-]

madtopo@reddit

Do you use GLTMOSS 120B with a harness? If so, which one? I found its speed impressive but the tool calling was underwhelming

[-]

ForsookComparison@reddit

I'd rather wait for a good output from 27B than quickly get slop from Qwen3.5-Next.

And heck, sometimes Qwen3.5-35B gets the best of both worlds if your task is simple enough.

[-]

Far-Low-4705@reddit

Idk, I feel like having the ability to iterate quickly and tell the model where it went wrong in 5min is better than getting it right first try but waiting 15min for it.

Just my opinion, I’d rather be in the loop and iterate faster. I see it as an assisting tool, not as a fully automated coder. And I feel like iterating is often better

[-]

ForsookComparison@reddit

If you stick with that mindset you'll never move past the "coding assistant" phase

[-]

Far-Low-4705@reddit

i dont want to have to debug, and learn thousands of lines of buggy code that i did not write.

if you are "vibe coding" then yeah sure, but if you're doing real work, that is absolutely not feasible.

[-]

ForsookComparison@reddit

That's fine and has plenty of merit. I just do not think it's a winning strategy in the long-term unless your goal is to gather everyone else's dust on your clothes.

I'm not even disagreeing with you, I just have a family to feed lol

[-]

Far-Low-4705@reddit

sure, i just prefer to get more work done rather than less.

that is just my preference, but do whatever you want to do

[-]

KURD_1_STAN@reddit

True, i like to use free claude more than free gpt, but we not talking about such models, 27b or 397b both are gonna make mistakes even for simple things

[-]

ForsookComparison@reddit

Compared to Opus and Sonnet? Yes definitely

Compared to Qwen3-Next/Coder? It's way more self-sufficient on its own without hand holding

[-]

Serious question - what are you using it for? What has been your experience in comparison to Qwen models? I never had much luck with any Google models - they work great with small snippets, but trip over themselves with larger code bases

[-]

asfbrz96@reddit

I'm using it as my coding agent, like I have GPT-5.4 as an orchestrator that sends tasks to my local models. What I notice is that when GPT-3.5 always breaks tool calls and just stops mid-process.

[-]

AvocadoArray@reddit

27b is better in almost every way. The biggest difference is how thorough it is when writing plans/specs and thinking through edge-cases. It remembers details over long contexts where Q3CN and even 3.5 122b fall short, and it can actually get itself out of failure loops in most cases.

That makes it perfect for planning and executing long ralph loops. I let one run the other night to build a TUI interface to replace one of my bash CLI tools. It ran for over an hour before it finally finished, and it implemented the feature perfectly. The only downside is that it took the instructions on writing extensive unit tests too seriously and ended up writing 300+ tests for silly failure modes like verify that calling docker ps fails if docker is not installed.

The larger MoE models are sometimes better when working with a less popular language or framework, but I prefer 27b with tooling that allows it to search the web, check reference docs, or look at the library's source to get the info it needs.

[-]