Y
Hacker News
new
|
ask
|
show
|
jobs
by
haffi112
391 days ago
(watching live) I'm wondering how it performs on the METR benchmark (
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
).