Hacker News new | ask | show | jobs
Measuring AI Ability to Complete Long Tasks (metr.org)
2 points by Gedxx 261 days ago