It's been a while since we had new Qwen & Qwen Coder models...

LMArena is not a reliable source of which models are best, especially for coding models - it has a very limited context size and the UI is geared for people to enter short prompts.

[-]

Ok_Warning2146@reddit

Well, I was just suggesting a reason why they didn't publish qwen3 yet. It would be nice if they can also beat gemma 3 at lmarena.

[-]

SkyFeistyLlama8@reddit

For what it's worth, in practical usage Gemma 3 27b is better for Python and C# coding than QWQ 32B or Qwen 2.5 Coder 14B.

[-]

deldongoo@reddit

Do you have any metrics to support this?

[-]

SkyFeistyLlama8@reddit

Metrics? Don't need no stinkin metrics. Imperial's the way to go. Compare Gemma 27B against Qwen 2.5 Coder 32B and see what you get with your favorite programming languages.

[-]

Regular_Working6492@reddit

Qwen scores better on the aider leaderboard, FWIW

[-]

Blues520@reddit

This is the one I'm waiting patiently for. I hope they spend their time creating a quality model that we can use for a long time. Qwen 2.5 has been stellar, so they can't drop the ball on this.

[-]

sammcj@reddit (OP)

It really has! I hadn't replaced it until recently with GLM-4 32b.

[-]

umataro@reddit

And is it an improvement? Which languages do you use it for? What sort of problems?

[-]

sammcj@reddit (OP)

Very much so, it's like a slightly better version of QWQ but without the reasoning/thinking overheads.

[-]

ForsookComparison@reddit

It's good at one shots but very poor at editing and instruction following. The Qwen family crushes it in editors despite losing in one-shot scenarios.

[-]

I'll second this sentiment. I have a bunch of non-coding tests I run on LLMs and GLM-4-32B doesn't really do particularly well. The one thing I haven't tried with it (and that people seem to be excited about) is their one-shot code generation... but honestly I personally don't have much use for one-shotting code (so I can't really comment on whether it's actually good on that front).

In short, GLM-4-32B seems rather meh to me and I don't understand why people are swearing by it..

[-]

ForsookComparison@reddit

Agreed. It's one shot abilities are amazing for 32B.. it can trade blows with QwQ without reasoning tokens, but after that first shot it all falls apart into mediocrity.

[-]

CheatCodesOfLife@reddit

Isn't Qwen kind of crap at this though (despite being the third best local model / best easy to run local model)?

https://aider.chat/docs/leaderboards/

[-]

ForsookComparison@reddit

You can build and edit/iterate with Qwen. Although yes it will eventually reach a point where complexity is too much to edit competently, Qwen Coder can be used with like . aider and roo very well for quite a while

[-]

CheatCodesOfLife@reddit

I'll have to give them a try with roo.

eventually reach a point where complexity is too much to edit competently

When I was using it, I pretty much got used to starting a new context after about 16k tokens.

[-]

umataro@reddit

Thank you, I'll give it a try. My use case is mostly DevOps, so bash, terraform, ansible and python.

I kind of gave up on looking for new models after qwen2.5-coder:32b and deepseek-r1:32b because these were finally good enough and there are just too many coming out every week.

[-]

CheatCodesOfLife@reddit

GLM-4 32b is good with this.

[-]

AdventurousSwim1312@reddit

My guess is that they intended to release earlier this month, with sota results, but with the release of gemini 2.5 they delayed a bit to tune on gemini output.

Gonna be great.

[-]

Ylsid@reddit

The words are spoken! Ready the quant engines!

[-]

SkyFeistyLlama8@reddit

QAT. Please do QAT versions of these so Qwen can compete against Gemma-3.

[-]

dampflokfreund@reddit

that would be nice. But Qwen 3 won't have Vision capabilities, so in my opinion G3 will still be ahead.

[-]

Better_Story727@reddit

I think QWen3 may encounter problems because its goal is to maximize community influence. Parameters like 15BA2B have the potential to maximize community influence, but it is impossible to achieve leading performance. I estimate that they may encounter similar difficulties as Llama4 in balancing performance and parameters.

[-]

nullmove@reddit

Qwen has never been about absolute leading performance, they play a different game as you noted. They have always had trouble scaling up (their 100B tier enterprise max models have barely ever been any better than 72B open weight ones).

For that matter, Llama 3 was never leading in the overall sense either. Zack saying Llama 4 would be SOTA was interesting, but quite simply a lot of stuff happened since then.

[-]

tengo_harambe@reddit

The only confirmed Qwen3 model sizes so far are 7B and 15B MoE. I think the worry is not in scaling up but down, especially with an MoE of that size which has been unheard of before. The Qwen team admittedly loves its performance metrics so I wonder if they and others are experiencing performance anxiety after seeing Llama 4's reception.

[-]

stoppableDissolution@reddit

Theres 3b moe from ibm, so not that unheard of

[-]

Evening_Ad6637@reddit

Yes it absolutely not that unheard. There is also olmoe with 7B and Deepseek coder lite with 16b moe

[-]

faldore@reddit

Qwen3 is frankly too small to be exciting. As much as I loved Qwen2.5.

[-]

Mushoz@reddit

Qwen2.5 was also committed with only one size, but ended up with many different sizes. I reckon Qwen3 will be the same. Only one size of each architecture (dense + MoE) has been posted about.

[-]