Hacker News new | ask | show | jobs
Measuring AI Ability to Complete Long Tasks – METR (metr.org)
2 points by diginova 316 days ago