| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by atleastoptimal 299 days ago
	I'm referring to the long-horizon task benchmark which has been exponential since GPT-2 https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...