| Author here -- six months ago we launched ARC Prize, a huge $1M experiment, to test if we need new ideas for AGI. The ARC-AGI benchmark remains unbeaten and I think we can now definitely say "yes". One big update since June is that progress is no longer stalled. Coming into 2024, the public consensus vibe was that pure deep learning / LLMs would continue scaling to AGI. The fundamental architecture of these systems hasn't changed since ~2019. But this flipped late summer. AlphaProof and o1 are evidence of this new reality. All frontier AI systems are now incorporating components beyond pure deep learning like program synthesis and program search. I believe ARC Prize played a role here too. All the winners this year are leveraging new AGI reasoning approaches like deep-learning guided program synthesis, and test-time training/fine-tuning. We'll be seeing a lot more of these in frontier AI systems in coming years. And I'm proud to say that all the code and papers from this year's winners are now open source! We're going to keep running this thing annually until its defeated. And we've got ARC-AGI-2 in the works to improve on several of the v1 flaws (more here: https://arcprize.org/blog/arc-prize-2024-winners-technical-r...) The ARC-AGI community keeps surprising me. From initial launch, through o1 testing, to the final 48 hours when the winning team jumped 10% and both winning papers dropped out of nowhere. I'm incredibly grateful to everyone and we will do our best to steward this attention towards AGI. We'll be back in 2025! |
It is a great unit test for reasoning -- that's fantastic! And maybe it is indeed the best way to test for this -- who knows exactly. But the claim is a little grandiose for what it is, this is somewhat similar to saying that testing on string parity is the One True Test for testing an optimizer's efficiency.
I'd heartily recommend maybe taking down the marketing vibrance down a notch and keep things a bit more measured, it's not entirely a meme, though some of the more-serious researchers don't take it as seriously as a result. And that's the kind of people that you want to attract to this sort of thing!
I think there is a potentially good future for ARC! But it might struggle to attract some of the kind of talent that you want to work on this problem as a result.