Hacker News new | ask | show | jobs
by tananaev 455 days ago
Did I read this right that only 2 humans out of 400 solved the problems?
2 comments

They started with N >= 120x3 tasks, and gave each task to 4-9 humans. Then they kept only those 120x3 tasks that at least 2 humans had solved.
That's a very small sample size by task. I wonder if they give the whole data set to an average human, what the result would be. I tried some simple tasks and they are doable, but I couldn't figure out the hard ones.
No, they're saying that the problems have been reviewed / play-tested by ≥2 humans, so they are not considered unfair or too ambiguous to solve in two attempts (a critique of some Arc-AGI-1 puzzles that o3 missed). They have a lot of puzzles so they were divided among some number of testers, but I don't think every tester had to try every problem.