|
|
|
|
|
by nxobject
343 days ago
|
|
I understand Chollet is transparent that the "branding" of the ARC-AGI-n suites is meant to be suggestive of its purpose, than substantial. However, it does rub me the wrong way - as someone who's cynical of how branding can enable breathless AI hype by bad journalism. A hypothetical comparison would be labelling SHRDLU's (1968) performance on Block World planning tasks as "ARC-AGI-(-1)".[0] A less loaded name like (bad strawman option) "ARC-VeryToughSymbolicReasoning" should capture how the ARC-AGI-n suite is genuinely and intrinsically very hard for current AIs, and what progress satisfactory performance on the benchmark suite would represent. Which Chollet has done, and has grounded him throughout! [1] [0] https://en.wikipedia.org/wiki/SHRDLU
[1] https://arxiv.org/abs/1911.01547 |
|
In practice when I have seen ARC brought up, it has more nuance than any of the other benchmarks.
Unlike, Humanity's Last Exam, which is the most egregious example I have seen in naming and when it is referenced in terms of an LLMs capability.