Hacker News new | ask | show | jobs
Open-world evaluations for measuring frontier AI capabilities [pdf] (cruxevals.com)
2 points by randomwalker 56 days ago