| It actually beats the human average by a wide margin: - 64.2% for humans vs. 82.8%+ for o3. ... Private Eval: - 85%: threshold for winning the prize [1] Semi-Private Eval: - 87.5%: o3 (unlimited compute) [2] - 75.7%: o3 (limited compute) [2] Public Eval: - 91.5%: o3 (unlimited compute) [2] - 82.8%: o3 (limited compute) [2] - 64.2%: human average (Mechanical Turk) [1] [3] Public Training: - 76.2%: human average (Mechanical Turk) [1] [3] ... References: [1] https://arcprize.org/guide [2] https://arcprize.org/blog/oai-o3-pub-breakthrough [3] https://arxiv.org/abs/2409.01374 |
Their post has stem grad at nearly 100%