| > Also, 1 odd thing I noticed is that the graph in their blog post shows the top 2 scores as “tuned” Something I missed until I scrolled back to the top and reread the page was this > OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set So yeah, the results were specifically from a version of o3 trained on the public training set Which on the one hand I think is a completely fair thing to do. It's reasonable that you should teach your AI the rules of the game, so to speak. There really aren't any spoken rules though, just pattern observation. Thus, if you want to teach the AI how to play the game, you must train it. On the other hand though, I don't think the o1 models nor Claude were trained on the dataset, in which case it isn't a completely fair competition. If I had to guess, you could probably get 60% on o1 if you trained it on the public dataset as well. |
Yeah, that makes this result a lot less impressive for me.