Top 10 Models on Humanity's Last Exam. Opus 4.6 is in the lead.

Posted by Ok_Presentation1577@reddit | LocalLLaMA | View on Reddit | 9 comments

With the new release of Opus 4.6, here's the top 10 in HLE. I know they're just benchmarks and don't mean anything on their own, but it's still interesting to make comparisons when a new model comes out.

Post: I also really enjoyed reading the System Card Anthropic published on their blog, there you can find information for use cases like finance, cybersecurity, biology etc.