Minimax-M2 cracks top 10 overall LLMs (production LLM performance gap shrinking: 7 points from GPT-5 in Artificial Analysis benchmark)

Posted by medi6@reddit | LocalLLaMA | View on Reddit | 28 comments

I've been analysing the Artificial Analysis benchmark set (94 production models, 329 API endpoints) and wanted to share some trends that seem notable.

Context
This is models with commercial API access, not the full experimental OS landscape. So mostly models you'd actually deploy out of the box rather than every research models

The gap between best tracked OS (MiniMax-M2, quality 61) and best proprietary (GPT-5, 68) is now 7 points. Last year it was around 18 points in the same dataset. Linear extrapolation suggests parity by Q2 2026 for production-ready models, though obviously that assumes the trend holds (and chinese labs keep shipping OSS models)

What's interesting is the tier distribution:

- Elite (60+): 1 OS, 11 proprietary
- High (50-59): 8 OS, 8 proprietary (we hit parity here)
- Below 50: OS dominates by volume

The economics are pretty stark.
OS average: $0.83/M tokens.
Proprietary: $6.03/M.
Value leaders like Qwen3-235B are hitting 228 quality per dollar vs \~10-20 for proprietary elite models (kind of a random approach but tried playing with this: quality per dollar = quality Index ÷ price/M tokens)

Speed is also shifting. OS on optimised infra (Groq, Fireworks) peaks at 3,087 tok/sec vs 616 for proprietary. Not sure how sustainable that edge is as proprietary invests in inference optimisation.

Made an interactive comparison: whatllm.org
Full write-up: https://www.whatllm.org/blog/open-source-vs-proprietary-llms-2025

Two questions I'm chewing on:

  1. How representative is this benchmark set vs the wider OS ecosystem? AA focuses on API-ready production models, which excludes a lot of experimental work, fine tuned models etc

  2. Is there a ceiling coming, or does this compression just continue? Chinese labs seem to be iterating faster than I expected.

Curious what others think about the trajectory here.