Hacker News new | ask | show | jobs
by ottaborra 540 days ago
I don't understand. If kaggle solutions were able to do those, what the hell do these mean?

https://arcprize.org/2024-results

1 comments

No individual Kaggle solution achieved a result of 81%, rather an ensemble of models: https://x.com/fchollet/status/1865865271728390515

In my (possibly flawed) interpretation: o3's scores appear to be an achievement because they were attained by a single model, but the benchmark itself needs refinement before it can claim to be a measure of AGI like it set out to be, as one can bruteforce their way to similar results.