AI Model Reviews

Posted by Typical-Tomatillo138@reddit | LocalLLaMA | View on Reddit | 46 comments

LLM benchmarks are terrible. Everyone overfits their models so they can max out benchmarks in no more than a few months after its release. Open source models release with headlines "90% of Opus at 5% of the cost", yet anyone who has actually used it can feel the obvious difference in quality.

It's impossible to find good reviews on models any more. Every result on the google search "minimax m2.7 review" is either

  1. AI-written slop blogposts made in 10 minutes. These are the worst.

  2. Meaningless benchmark results either by the big orgs (overfitting) or personal test results (doesn't translate between use cases)

  3. Reddit threads with very conflicting information: comments are evenly divided between GLM, Qwen and Minimax with everyone reporting different quality

Are there any good sources for model reviews left in 2026? I can't seem to find any.