Hacker News new | ask | show | jobs
by ezyang 426 days ago
Lmarena isn't that useful anymore lol
1 comments

I actually agree with that, but it's generally better than other scores. Also, the quote is like a year old at this point.

In practice you have to evaluate the models yourself for any non-trivial task.