Y
Hacker News
new
|
ask
|
show
|
jobs
by
wat10000
99 days ago
"Proper" may be doing some work here, but such a test was run last year and GPT-4.5 and LLaMa-3.1-405B both passed. Oddly, GPT-4.5 was judged as human significantly more often than chance.
https://arxiv.org/abs/2503.23674