| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tananaev 455 days ago
	Did I read this right that only 2 humans out of 400 solved the problems?

2 comments

trott 455 days ago

They started with N >= 120x3 tasks, and gave each task to 4-9 humans. Then they kept only those 120x3 tasks that at least 2 humans had solved.

link

tananaev 455 days ago

That's a very small sample size by task. I wonder if they give the whole data set to an average human, what the result would be. I tried some simple tasks and they are doable, but I couldn't figure out the hard ones.

link

mapmeld 455 days ago

No, they're saying that the problems have been reviewed / play-tested by ≥2 humans, so they are not considered unfair or too ambiguous to solve in two attempts (a critique of some Arc-AGI-1 puzzles that o3 missed). They have a lot of puzzles so they were divided among some number of testers, but I don't think every tester had to try every problem.

link