Hacker News new | ask | show | jobs
by shnkr 812 days ago
I'm no more trusting the benchmarks. other than trying it out myself, what else can we do here?
1 comments

It's already been done (ELO, see LMSYS rankings). I hope we're cresting past the 50% percentile mark of people who haven't heard of it.
I see. thanks for the reference. followed it on x now.

https://twitter.com/lmsysorg/status/1772759835714728217