|
|
|
|
|
by tylerhou
299 days ago
|
|
Sorry, you're right that the chart on the home page does not have human performance. The leaderboard chart does: https://arcprize.org/leaderboard. And the leaderboard by default shows scores for ARC-AGI 1 and 2. The models are much worse at 2 than 1; the best performing model scores around 15% (Grok 4, thinking), while humans are at ~100%. |
|
EDIT: OK, I see there are 3 types of humans:
"Avg. Mturker" does worst. "Stem Grad" and "Human Panel" are basically equivalent in terms of quality but differ in cost.
It's not obvious to me whether an average Mturker would be more or less clever than the average person. Mturk doesn't pay very well, so you'd think you'd have to be below average to want to do it. But potentially it attracts people of above-average intelligence who just happen to live in the third world?