Hacker News new | ask | show | jobs
by chriscappuccio 535 days ago
o1 did terrible. o3 did well on arc-agi-pub (public training data) but hasn't passed the private test yet.
1 comments

Is the test still private once it has been run? If you call the OpenAI API and send it some data, OpenAI has access to the data. Did the benchmaker run the models locally somehow?
The private test is supposed to be run by the ARC-AGI organization themselves, without network access. That's why o3 has not been run against it yet. Not sure if it will be possible either, depends on what OpenAI is prepared to do about it.