Hacker News new | ask | show | jobs
by ufmace 371 days ago
Now that I have a little more time to search around, I easily found this study, published March 31st this year, so not quite 3 months ago:

https://arxiv.org/abs/2503.23674

I only skimmed it, but I don't see anything clearly wrong about it. According to their results, GPT-4.5 with what they term a "persona" prompt does in fact pass a standard that seems to me at least a little harder than what you said - actively picks the AI as the human, which seems stricter to me than being "unable to distinguish".

It is a little surprising to me that only that one LLM actually "passed" their test, versus several others performing somewhat worse. Though it's also not clear exactly how long ago the actual tests were done - this stuff moves super fast.