Qwen 3 Max Official Benchmarks (possibly open sourcing later..?)

[-]

GreenTreeAndBlueSky@reddit

They never open sourced their max versions. Their open source models are essentially advertising and probably some distils of max models

[-]

Illustrious_Row_9971@reddit

its available by default here for free: https://huggingface.co/spaces/akhaliq/anycoder

[-]

Finanzamt_Endgegner@reddit

tbf there were better smaller models available soon after and there was never a 2.5max released, it was only preview as far as i know

[-]

	Qwen 3 Max	gpt-oss-120b
SuperGPQA	64.6	51.9
AIME25	80.6	97.9
LiveCodeBench v6	57.5	78.6
Arena-Hard v2	86.1	NA
LiveBench	79.3	54.6

entsnack@reddit

It's thinking Qwen, the Qwen numbers are from the Alibaba report not independent benchmarks.

[-]

I would advise you to recheck that, if you look at the benchmark provided in this very post, they are comparing with other non-thinking models including Claude 4 opus non-thinking, deepseek V3.1 non-thinking (only 49.8 AIME) and their own Qwen 3 235b A22 non-thinking. I know this because I distinctly remember Qwen 3 235b non-thinking gets 70% on AIME 2025 while the thinking one gets around 92

[-]

Massive-Shift6641@reddit

I see zero improvement of this model on my tasks. Sorry but it's likely just a benchmaxxxslop.

[-]

shark8866@reddit

i see u in the lmarena server

[-]

Independent-Wind4462@reddit

Seems good but considering its 1 trillion parameter model 🤔 difference between 235 and it isn't much

But still from early testing it looks like good really good model

[-]

Professional-Bear857@reddit

I think that's diminishing returns at work

[-]

SlapAndFinger@reddit

At this stage RL is more about dialing in edge cases, getting tool use consistent, stabilizing alignment, etc. The edge cases and tool use improvements can still lead to sizeable improvements in model usability but they won't show up in benchmarks really.

[-]

Finanzamt_Endgegner@reddit

Its a preview so a lot of training is not yet done

[-]

x54675788@reddit

Don't get your hopes up for open source model.

There is no incentive in spending millions of dollars for training if they can't sell you access to the best model.

[-]

JMowery@reddit

There is no incentive in spending millions of dollars for training if they can't sell you access to the best model.

Are you donating money to the cause or paying for the API access to their open source models? If not, why do you expect everything to be free?

It's the same and usual enshittification path.

Sounds like you're very unappreciative. Businesses exist to make money. And while enshittification does happen, why are you making such a fuss and assuming that terrible things are going to happen when this very same company is the only one to give us an even remotely good video model, a pretty great image model, and the best open source coding model. Like... I don't like what's happening with big companies, but Alibaba has been pretty nice so far.

Why not wait before spewing off hatred?

[-]

Seems quick for a large model

[-]

Trevor050@reddit (OP)

not out yet