| https://www.dwarkesh.com/p/francois-chollet (June 2024, about ARC-AGI-1. Note the AGI right in the name) > I’m pretty skeptical that we’re going to see an LLM do 80% in a year. That said, if we do see it, you would also have to look at how this was achieved. If you just train the model on millions or billions of puzzles similar to ARC, you’re relying on the ability to have some overlap between the tasks that you train on and the tasks that you’re going to see at test time. You’re still using memorization. > Maybe it can work. Hopefully, ARC is going to be good enough that it’s going to be resistant to this sort of brute force attempt but you never know. Maybe it could happen. I’m not saying it’s not going to happen. ARC is not a perfect benchmark. Maybe it has flaws. Maybe it could be hacked in that way. e.g. If ARC is solved not through memorization, then it does what it says on the tin. [Dwarkesh suggests that larger models get more generalization capabilities and will therefore continue to become more intelligent] > If you were right, LLMs would do really well on ARC puzzles because ARC puzzles are not complex. Each one of them requires very little knowledge. Each one of them is very low on complexity. You don't need to think very hard about it. They're actually extremely obvious for human > Even children can do them but LLMs cannot. Even LLMs that have 100,000x more knowledge than you do still cannot. If you listen to the podcast, he was super confident, and super wrong. Which, like I said, NBD. I'm glad we have the ARC series of tests. But they have "AGI" right in the name of the test. |