Hacker News new | ask | show | jobs
by grahamplace 198 days ago
See: https://lmarena.ai/leaderboard
1 comments

Unless you overfit to benchmark style scenarios and are worse for real-world use.