| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by YeGoblynQueenne 601 days ago

>> If it was structured with a reasonable amount of compute, and instead, time-accuracy gates were used for prizes, it would be much more open. But people do not use it because the game is rigged to begin with!

The entire benchmark is set up so as to try and make it _artificially_ hard for deep learning: there are only three examples for each task; AND the private test set has a different distribution than the public training and validation sets (from what I can tell; a violation of PAC-Learning assumptions and then why should anyone be surprised if machine learning approaches in general can't deal with that?).

Even I (long story) find ARC to be unfair in the simplest sense of the word: it does not make for a level playing field that would allow for disparate approaches to machine learning to be compared fairly. Strangely and uniquely, the unfairness is aimed at the dominant approach, deep learning, where every other benchmark tends to skew towards deep learning (e.g. huge feature-based, labelled data).

But why's that? If ARC-AGI is a true test of AGI, or intelligence, or whatever it is supposed to be (an IQ test for AIs) then why does it have to jump through hoops just to defend itself from the dominant approach to AI? If it's a good test for AI, and the dominant approach to AI can't really do AI, then the dominant approach should not be capable of passing the test, without any shenanigans with reduced compute or few examples.

Is the purpose to demonstrate that deep neural nets can't generalise from few examples? That's machine learning 101 (although I guess there's still those who missed the lecture). Is it to encourage deep neural nets to get better at generalising from few examples? Well, first place just went to a big, deep, bad neural net with data augmentation so that doesn't even work.