| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by optimalsolver 563 days ago
	>This demonstrates empirically that ARC-AGI cannot be solved purely via memorization and interpolation Now that the current challenge is over, and a successor dataset is in the works, can we see how well the leading LLMs perform against the private test set?

1 comments

tuukkah 563 days ago

I think the "semi-private" numbers here already measure that: https://arcprize.org/2024-results

For example, Claude 3.5 gets 14% in semi-private eval vs 21% in public eval. I remember reading an explanation of "semi-private" earlier but cannot find it now.

link