Opus 4.5 claims 1st place on fresh SWE-bench-like problems in October [SWE-rebench]

Posted by Fabulous_Pollution10@reddit | LocalLLaMA | View on Reddit | 3 comments

Hey everyone,

We were excited about yesterday's release of Opus 4.5 and rushed to update the SWE-rebench leaderboard.

As generally expected, Opus 4.5 has claimed first place. Remarkably, it is much more cost-efficient than Opus 4, and only slightly more expensive per problem than Sonnet 4.5.

Check out the full leaderboard. Feel free to reach out if you'd like to see other models evaluated (Gemini 3 Pro is already on the way, of course).