Hacker News new | ask | show | jobs
by horhay 207 days ago
They ran the tests themselves only on semi-private evals. Basically the same caveat as when o3 supposedly beat ARC1