| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lukan 40 days ago
	"It is a bit arbitrary, but I think this is what they're tracking." I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.

1 comments

cyanydeez 40 days ago

ok, but arn't you just measuring efficiency and not the big I in AGI improvements.

link

jsnell 40 days ago

No? I think you're misunderstanding what is being measured.

It is purely a test of capabilities (can it do a thing that takes a human $X hours), not efficiency (how fast will it do it).

link

Leynos 40 days ago

It also measures task coherence—ability to plan, form contingencies, recover from errors, mitigate accumulation of errors, and reconcile findings across a long context window.

link

lukan 40 days ago

Yes, but this study was not about that and "just efficiency" is actually what most people are after.

At least I want AI to solve my problems, not score high on a academic leaderboard.

link