Hacker News new | ask | show | jobs
by wat10000 99 days ago
"Proper" may be doing some work here, but such a test was run last year and GPT-4.5 and LLaMa-3.1-405B both passed. Oddly, GPT-4.5 was judged as human significantly more often than chance. https://arxiv.org/abs/2503.23674