LMSYS (LMarena.ai) is highly susceptible to manipulation
Posted by Economy_Apple_4617@reddit | LocalLLaMA | View on Reddit | 7 comments
Here’s how I see it:
If you're an API provider for a closed LLM, like Gemini, you can set up a simple checker on incoming request traffic. This checker would verify whether the incoming query matches a pre-prepared list of questions. If it does, a flag is raised, indicating that someone has submitted that question, and you can see how your LLM responded. That’s it.
Next, you go to LMSYS, ask the same question, and if the flag is raised, you know exactly which of the two responses came from your LLM. You vote for it. Implementing this is EXTREMELY SIMPLE and COMPLETELY IMPOSSIBLE for LMSYS to track or verify. You wouldn’t even need human intervention—you could create a bot to cycle through the question list and vote accordingly. This way, you could artificially boost your model's ELO rating to any level you want.
So, the immediate question is: What is LMSYS doing to address this issue? The only real solution I see is for LMSYS to host the LLMs themselves, preventing API providers from intercepting requests and responses. However, even this wouldn't solve the problem of certain models being recognizable simply by the way they generate text.
brown2green@reddit
I see more fundamental issues with how a single Elo rating can be used to rate open language tasks where there might not necessarily be a win/lose outcome like with chess battles.
-p-e-w-@reddit
I think you’re dramatically overestimating the importance of LMSYS. It’s almost never cited. Press releases near-universally focus on automated benchmarks like MMLU. And those are much easier to game because they can simply be trained on.
When spinning conspiracy theories, it’s important to not lose sight of the risk/benefit ratio of the supposed conspiracy. The top ranks in LMSYS are dominated by some of the world’s biggest companies, and largely match the perception of people who work in the field. Do you seriously believe that Google is going to implement the scheme you proposed, just so that users on this forum (which is probably the place where LMSYS is discussed the most) fawn over how it’s better than GPT 4.5 in this one metric, and risk massive ridicule if exposed? Come on.
TedHoliday@reddit
I dunno man, Meta got caught literally torrenting 82 TB of books they didn’t pay for. That’s millions of books. They’re not the only ones, and I guarantee that’s only the tip of the iceberg. I would definitely not put these companies on any kind of ethical high horse.
maikuthe1@reddit
Your reply didn't address anything in that comment lol. All you said is "um I think the conspiracy is real..."
TedHoliday@reddit
Wut
Economy_Apple_4617@reddit (OP)
You could insert into your presentation "Ranked 3 on LMSYS board"
Don't underestimate this
Recoil42@reddit
Probably nothing, at the moment — LMSYS is a non-profit community-driven organization, OP. You aren't paying for their service, and they don't owe the world absolute robustness, as much as I'm sure they strive for it. Yes, voting could probably be manipulated in a few ways, and on some level they're trusting the rest of the research community to participate with integrity. If it becomes an issue, I'm sure the team will devote some attention to it, but if you think this is a pressing problem now, consider donating your time to solve it.