|
|
|
|
|
by submarius
44 days ago
|
|
Cool work — quick question: how should readers think about the fact that Interfaze-Beta is on the leaderboard you built? Not saying anything's wrong with the methodology, just curious how you'd recommend a third party verify the ranking is neutral to the choices you made (datasets, difficulty weights, reasoning-off default, etc.). |
|
To validate the choices and configurations, feel free to give it a reading. We also breakdown our methodology in the blog and in-depth within the paper.