Y
Hacker News
new
|
ask
|
show
|
jobs
by
zimbo63
117 days ago
This is an amazing eval metric that no one thought about! such a creative idea. Have you thought of other games? how different it is from chess?
1 comments
mbh159
117 days ago
yes we have a new game launching everyday this week. We're looking to add more domains to test how the jaggedness of AI differs between model providers and better evaluate how they perform across domains
link