Xiami mimo-v2.5 pro MIT license surpasses Opus 4.5 on arena

Posted by Terminator857@reddit | LocalLLaMA | View on Reddit | 21 comments

Many asked when we will have open weight model that is better than Opus. Well now we have it. Mimo is ranked #9 and Opus 4.5 is ranked #10.

https://arena.ai/leaderboard/text/coding-no-style-control

[-]

LoveMind_AI@reddit

So I just gotta say... a couple days into using this now and I am just blown away. It's absolutely just as good as Opus. Zero tears shed for Claude turning into a beep-boop automaton.

[-]

someone383726@reddit

Just waiting on my allowance from my mom for making my bed this week then I will make the purchase!

[-]

UnbeliebteMeinung@reddit

I hope you have enough good boi points

[-]

Ok-Contest-5856@reddit

Wasn’t GLM 5.1 ahead of Opus 4.5 for a while and then they updated the leaderboard and it dropped significantly. Anyone know what happened?

[-]

It looks like 5.1 still matches Opus 4.5 on average, but with higher variability. And Mimo v2.5 is 1 point ahead of both, but with even higher variability. So Opus 4.5 and GLM 5.1 are more consistent.

[-]

XTCaddict@reddit

Aren’t these human people rating it? If so I don’t know it consistent is the correct term because mimo had its weights dropped which would have led to more people reviewing

[-]

-dysangel-@reddit

the tests are blind - people don't know which model is which. That model is new, so it has fewer ratings

[-]

XTCaddict@reddit

Ahhh ok fair enough

[-]

9gxa05s8fa8sh@reddit

what happened is human psychology. people's brains see expensive things as better. when you do blinded tests like in science or llmarena, you see the truth

[-]

Terminator857@reddit (OP)

Good point, with more votes and likely will drop below Opus, since that has been the trend.

[-]

SmartCustard9944@reddit

One order of magnitude less votes. It is too early.

[-]

Feisty-Patient-7566@reddit

Is Opus 4.5 still being rigorously tested? I'd say the expectations for a model are higher now than when Opus 4.5 was released, so Mimo's high score makes it look better.

[-]

andy_potato@reddit

Yet another benchmaxxed model

[-]

Terminator857@reddit (OP)

How does one benchmax arena coding?

[-]