| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wat10000 147 days ago
	"Proper" may be doing some work here, but such a test was run last year and GPT-4.5 and LLaMa-3.1-405B both passed. Oddly, GPT-4.5 was judged as human significantly more often than chance. https://arxiv.org/abs/2503.23674