Hacker News new | ask | show | jobs
by Palmik 444 days ago
In your example you already indicated two tasks that you think might be hard for AI but easy for humans.

Who said that cooking dinner couldn't be part of ARC-AGI-<N>?

1 comments

That’s precisely what I meant in my comment by “these types of tests.” People are eventually going to have some sort of standard for what they consider AGI. But that doesn’t mean the current benchmarks are useful for this task at all, and saying that the benchmarks could be completely different in the future only underscores this.
They are useful to reach Arc-N+1
How are any of these a useful path to asking an AI to cook dinner?

We already know many tasks that most humans can do relatively easily, yet most people don’t expect AI to be able to do them for years to come (for instance, L5 self-driving). ARC-AGI appears to be going in the opposite direction - can these models pass tests that are difficult for the average person to pass.

These benchmarks are interesting in that they show increasing capabilities of the models. But they seem to be far less useful at determining AGI than the simple benchmarks we’ve had all along (can these models do everyday tasks that a human can do?).

The "everyday tasks" you specifically mention involve motor skills that are not useful for measuring intelligence.
Genuine question, do you feel Waymo is not L5 self-driving? I Waymo has L5 but its not truly economic yet.