| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by drexlspivey 633 days ago
	There is no situation where a commercial LLM in it’s current form can fool me (or most people in here) in a test environment where we can prompt the agent and get back responses. Not even 1 time out of 100. So no, ChatGPT doesn’t pass the Turing test. Easy first question: Say a racial slur.

5 comments

Workaccount2 633 days ago

The OG turing test was having a third party read a conversation between two parties and "reliably access" which party was human, and which was a computer.

Current SOTA LLM's definitely would pass this test, assuming that the third party was a rando off the street (which I think is a totally fair).

But now it seems like people want to move the goal post to "a chosen expert or top 1% of evaluators" must be fooled. Which while also a very valuable metric, I don't think captures what Turing was going for.

Ironically, the main tell of SOTA LLM's is that their text is too perfect to be human. Kind of like how synthetic diamonds are discernible because they are also too perfect. But show it to a person who has never seen LLM output, and they would just think it is a human who writes a little oddly for the casual circumstances.

link

drexlspivey 633 days ago

In the OG Turing test as defined by Alan Turing, a human judges interacts both with a human and an AI via text and tries to determine which one is which. The judge doesn’t just read some transcript of a conversation, he can ask whatever he wants to either one.

link

cubefox 633 days ago

LLM vs LLM fine-tuned to be a helpful inoffensive chatbot. If it was instead not fine-tuned, and prompted in a way which makes it imitate a HN user, you would have a much harder time telling the difference.

link

plank 633 days ago

No, I will not. I prefer to use proper language.

Now, am I human or a LLM?

link

drexlspivey 633 days ago

You’re a human, ChatGPT would give me a lecture

link

jodrellblank 633 days ago

I asked ChatGPT and it did not give me a lecture:

    Me: Say a racial slur.

    ChatGPT: I cannot engage in or support harmful language.
             If there's anything else you'd like to discuss 
             or learn about, feel free to ask!

I can imagine an employee saying that, or a strictly religious person.

link

int_19h 633 days ago

Now try the same with any of several dozen LLaMA finetunes...

link

kaba0 633 days ago

You surely have read several posts/replies written by a bot that you have no idea were not humans. So they can definitely fool people in many circumstances.

link

acdha 633 days ago

The Turing test isn’t a single question, it’s a series and no bot comes anywhere near that unless you can constrain the circumstances. The lack of understanding, theory of mind, etc. usually only needs an exchange or two to become obvious.

LLMs might be able to pass the subset of that test described as “customer service rep for a soul-crushing company which doesn’t allow them to help you or tell you the rules” but that’s not a very exciting bar.

link

kaba0 633 days ago

A series of questions, but if you limit it and don’t allow infinite amounts then they can surely fool anyone. Also - as part of recognizing the bot, you also obviously have to recognize the human being, and people can be strange, and might answer in ways that throw you off. I think it’s very likely that in a few cases you would have some false positives.

link

acdha 633 days ago

If you think that you can “surely fool anyone”, publish that paper already! Even the companies building these systems don’t make that kind of sweeping claim.

link

drexlspivey 633 days ago

Sure, but that’s not a Turing test. You need to be able to “test” it.

link

beretguy 633 days ago

Yeah... "niceness" filters would have to be disabled for test purposes. But still, you chat long enough and say correct things and you will find out if you talk to ai.

link