Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings

Y	Hacker News new \| ask \| show \| jobs

	Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings (lmsys.org)
	50 points by MMMercy2 1186 days ago

3 comments

It's glad to see the old technique is used for new models.

You may also learn from 7.33 dota update which uses a new ranking algorithm called Glicko.

could you provide any reference? Is it a variant of ELO?

Elo != ELO.

One is a rating system named after the creator, Arpad Elo. See https://en.wikipedia.org/wiki/Elo_rating_system

Valve listed some reason for making the change.

Surprised to learn StableLM is worse than plain LLaMA. link to their leaderboard: leaderboard.lmsys.org

I’ve heard that it’s really bad for it’s size

This is a very good idea.