Opus 4.5 claims 1st place on fresh SWE-bench-like problems in October [SWE-rebench]
Posted by Fabulous_Pollution10@reddit | LocalLLaMA | View on Reddit | 3 comments
Hey everyone,
We were excited about yesterday's release of Opus 4.5 and rushed to update the SWE-rebench leaderboard.
As generally expected, Opus 4.5 has claimed first place. Remarkably, it is much more cost-efficient than Opus 4, and only slightly more expensive per problem than Sonnet 4.5.
Check out the full leaderboard. Feel free to reach out if you'd like to see other models evaluated (Gemini 3 Pro is already on the way, of course).
LocalLLaMA-ModTeam@reddit
Rule 2 - Posts must be related to the topic of LLMs (preferably local).
Pristine-Woodpecker@reddit
I don't get this - that bench includes a bunch of local LLMs and compares them to the SOTA. It's extremely valuable.
voronaam@reddit
Looking at the dataset on Hugging Face:
100% of test problem are written in a single programming language. And in one with a fairly divergent syntax - the only one in top-5 languages not using C-style curly brackets, etc.
Something tells me this benchmark does not really matter... It covers a fairly small and isolated corner of Software Engineering.