Y
Hacker News
new
|
ask
|
show
|
jobs
by
throw83288
493 days ago
Apparently OpenAI's Deep Research already saturated a quarter of this benchmark, more or less a month in. But I also imagine it makes baffling mistakes anyway.
"Humanity's Last
er
Exam" coming up when?