Hacker News new | ask | show | jobs
by tylerhou 299 days ago
Sorry, you're right that the chart on the home page does not have human performance. The leaderboard chart does: https://arcprize.org/leaderboard. And the leaderboard by default shows scores for ARC-AGI 1 and 2. The models are much worse at 2 than 1; the best performing model scores around 15% (Grok 4, thinking), while humans are at ~100%.
1 comments

Thanks, and do we know if the humans are average people off the street, or unusually-intelligent people?

EDIT: OK, I see there are 3 types of humans:

"Avg. Mturker" does worst. "Stem Grad" and "Human Panel" are basically equivalent in terms of quality but differ in cost.

It's not obvious to me whether an average Mturker would be more or less clever than the average person. Mturk doesn't pay very well, so you'd think you'd have to be below average to want to do it. But potentially it attracts people of above-average intelligence who just happen to live in the third world?

Additional caveat: some of the "avg mturker" cohort are almost certainly using LLMs to participate.