|
>> Now that the AI research field is coming around to the idea that something beyond deep learning is needed, the story matters less, and the benchmark, and future versions, can stand on their utility as a compass towards AGI. How so? All the three top systems are deep neural net systems. The first place went to a system that, quoting from the "contributions" section of the paper, employed: >> An automated data generation methodology that starts with 100-160 program solutions for ARC training tasks, and expands them to make 400k new problems paired with Python solutions As I pointed out in another comment the top results in ARC have been achieved by ordinary, deep-learning, big-data, memorisation based approaches. You and fchollet (in these comments) try to claim otherwise but I don't understand why. In fact, no, I understand why. I think fchollet wanted to place ARC as "not just a benchmark", the opposite of what tbalsam is asking for above. The motivation is solid: if we've learned anything in the last twenty-thirty years is that deep neural nets are very capable at beating benchmarks. For any deep neural net model that beats a benchmark though the question remains whether it can do anything else besides. Unfortunately, that is not a question that can be answered by beating yet another benchmark. And here we are now, and the first place in the current ARC challenge goes to a deep neural net system trained on a synthetically augmented dataset. The right thing to do now would be to scale back the claims about the magickal AGI-IQ test with unicorns, and accept that your benchmark is just not any different than any other previous AI benchmark, that it is not any more informative than any other benchmark, and that a completely different kind of test of artificial intelligence is needed. There is after all such a thing as scientific integrity. You make a big conjecture, you look at the data, realise that you're wrong, accept it, and move on. For example the authors of GLUE did that (as in SUPERGLUE). The authors of the Winograd Schema Challenge did that. You should follow their examples. |
What do you think about limiting the submission size? Kaggle does this sometimes.
With a limit like 0.1-1MB (compressed), you are basically saying: "Give me sample-efficient learning algorithms, not pretrained models."