Hacker News new | ask | show | jobs
by haffi112 391 days ago
(watching live) I'm wondering how it performs on the METR benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...).