Hacker News new | ask | show | jobs
by hugetim 190 days ago
Contrary to what the leaderboard lists as the human score, their technical paper implies a human baseline of ~48%.