Hacker News new | ask | show | jobs
by zimbo63 117 days ago
This is an amazing eval metric that no one thought about! such a creative idea. Have you thought of other games? how different it is from chess?
1 comments

yes we have a new game launching everyday this week. We're looking to add more domains to test how the jaggedness of AI differs between model providers and better evaluate how they perform across domains