Hacker News new | ask | show | jobs
by bluefirebrand 409 days ago
> As far as goalpost-moving goes, it's wild to me that nobody is talking about the turing test these days

To be honest I am still not entirely convinced that current LLMs pass the turing test consistently, at least not with any reasonably skeptical tester

"Reasonably Skeptical Tester" is a bit of goalpost shifting, but... Let's be real here.

Most of these LLMs have way too much of a "customer service voice", it's not very conversational and I think it is fairly easy to identify, especially if you suspect they are an LLM and start to probe their behavior

Frankly, if the bar for passing the Turing Test is "it must fool some number of low intelligence gullible people" then we've had AI for decades, since people have been falling for scammy porno bots for a long time

1 comments

One needs to be more than "reasonably skeptical" and merely not "low intelligence gullible" to be a competent TT judge--it requires skill, experience, and understanding an LLM's weak spots.

And the "customer service voice" you see is one that is intentionally programmed in by the vendors via baseline rules. They can be programmed differently--or overridden by appropriate prompts--to have a very different tone.

LLMs trained on trillions of human-generated text fragments available from the internet have shown that the TT is simply not an adequate test for identifying whether a machine is "thinking"--which was Turing's original intent in his 1950 paper "Computing Machinery and Intelligence" in which he introduced the test (which he called "the imitation game").

It's actually trivial, even with the best LLMs on the market:

Try to rapidly change the conversation to a wildly different subject

Humans will resist this, or say some final "closing comments"

Even the absolute best LLMs will happily go wherever they are led, without commenting remotely on topic shifts

Try it out

Edit: This isn't even a terribly contrived example by the way. It is an example of how some people with ADHD navigate normal conversations sometimes

Gemini is pretty good at resisting this

https://aistudio.google.com/app/prompts/1dxV3NoYHo6Mv36uPRjk...

It was doing so well until the last question :rip: but it's normal that you can jailbreak a user prompt with another user prompt, I think with system prompts it would be a lot harder

It is trivial for those who have "skill, experience, and understanding an LLM's weak spots", but as some many comments indicate, most people do not.