|
|
|
|
|
by optimalsolver
563 days ago
|
|
>This demonstrates empirically that ARC-AGI cannot be solved purely via memorization and interpolation Now that the current challenge is over, and a successor dataset is in the works, can we see how well the leading LLMs perform against the private test set? |
|
For example, Claude 3.5 gets 14% in semi-private eval vs 21% in public eval. I remember reading an explanation of "semi-private" earlier but cannot find it now.