| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Palmik 492 days ago
	In your example you already indicated two tasks that you think might be hard for AI but easy for humans. Who said that cooking dinner couldn't be part of ARC-AGI-<N>?

1 comments

Chathamization 492 days ago

That’s precisely what I meant in my comment by “these types of tests.” People are eventually going to have some sort of standard for what they consider AGI. But that doesn’t mean the current benchmarks are useful for this task at all, and saying that the benchmarks could be completely different in the future only underscores this.

link

pillefitz 492 days ago

They are useful to reach Arc-N+1

link

Chathamization 492 days ago

How are any of these a useful path to asking an AI to cook dinner?

We already know many tasks that most humans can do relatively easily, yet most people don’t expect AI to be able to do them for years to come (for instance, L5 self-driving). ARC-AGI appears to be going in the opposite direction - can these models pass tests that are difficult for the average person to pass.

These benchmarks are interesting in that they show increasing capabilities of the models. But they seem to be far less useful at determining AGI than the simple benchmarks we’ve had all along (can these models do everyday tasks that a human can do?).

link

fastball 492 days ago

The "everyday tasks" you specifically mention involve motor skills that are not useful for measuring intelligence.

link

mchusma 491 days ago

Genuine question, do you feel Waymo is not L5 self-driving? I Waymo has L5 but its not truly economic yet.

link