Y
Hacker News
new
|
ask
|
show
|
jobs
by
atleastoptimal
299 days ago
I'm referring to the long-horizon task benchmark which has been exponential since GPT-2
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...