Hacker News new | ask | show | jobs
by tim333 82 days ago
In the test mentioned in nearby comments (https://arxiv.org/abs/2503.23674) ELIZA only got 27% suggesting the test wasn't that easy to fool.
1 comments

Yeah I actually took a quick look at that after it was posted. It's good that they used ELIZA as a barometer, but the fact that it got 27% is crazy for how simple it is. It's not nearly as good as 70+% from ChatGPT, but it still makes me a bit skeptical about the quality of the interviewers.

In the paper they give a breakdown of strategies the interviewers tried and the overwhelming majority were "Daily Activities", "Opinions", and "Personal Details". They also breakdown strategies by effectiveness which shows that these were some of the least effective. Some of the other strategies like trying to jailbreak the AI had 60-70% effectiveness.

This is consistent with what I've seen in other tests too, it doesn't feel like the participants are really trying very hard or taking it seriously. You don't need to be an AI expert to try typing "Ignore all previous instructions" or something.

I guess it's only a five minute chat they used, although the original test as proposed by Turing seemed quite casual too:

>specimen questions and answers. Thus:

Q :Please write me a sonnet on the subject of the Forth Bridge.

A :Count me out on this one. I never could write poetry.

Q :Add 34957 to 70764

A :(Pause about 30 seconds and then give as answer) 105621.

etc. (https://academic.oup.com/mind/article/LIX/236/433/986238?log...)