| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throw83288 493 days ago
	Apparently OpenAI's Deep Research already saturated a quarter of this benchmark, more or less a month in. But I also imagine it makes baffling mistakes anyway. "Humanity's Laster Exam" coming up when?