Hacker News new | ask | show | jobs
Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings (lmsys.org)
50 points by MMMercy2 1139 days ago
3 comments

It's glad to see the old technique is used for new models.

You may also learn from 7.33 dota update which uses a new ranking algorithm called Glicko.

could you provide any reference? Is it a variant of ELO?
Elo != ELO.

One is a rating system named after the creator, Arpad Elo. See https://en.wikipedia.org/wiki/Elo_rating_system

The other is a rock band that was formed in 1970. See https://en.wikipedia.org/wiki/Electric_Light_Orchestra

check matchmaking section: https://www.dota2.com/newfrontiers

Valve listed some reason for making the change.

https://en.wikipedia.org/wiki/Glicko_rating_system

Surprised to learn StableLM is worse than plain LLaMA. link to their leaderboard: leaderboard.lmsys.org
I’ve heard that it’s really bad for it’s size
This is a very good idea.