why is o1 not ranked #1 on lmsys
Posted by Ok-Engineering5104@reddit | LocalLLaMA | View on Reddit | 4 comments
o1 ranks #1 or #2 for all of the subcategories:
- math #1
- instruction following #1
- multi-turn #2
- coding #2
- hard prompts (overall) #1
- hard prompts (english) #1
- longer query #2
... but its ranked #3 overall lol. i don't get how this works
Feztopia@reddit
There are probably a number of conversations which don't fit into any of the subcategories. You expect the list to be exhaustive but that doesn't have to be the case.
Due-Memory-6957@reddit
Because there are other things they couldn't categorize that still count for overall.
Electroboots@reddit
LMSYS doesn't rate by technical capability, it ranks by a (very specific, low context) human preference score. Livebench is better for something like this.
acec@reddit
Is o1 a local llm model? Then... who cares ;-)