Y
Hacker News
new
|
ask
|
show
|
jobs
by
horhay
207 days ago
They ran the tests themselves only on semi-private evals. Basically the same caveat as when o3 supposedly beat ARC1