Hacker News new | ask | show | jobs
by schleck8 1596 days ago
I tried this with the new open source GPT NeoX 20B model on GooseAI playground and it suceeded too:

Q: Can here you here still here understand here what here I here am here asking here if here I here put here words here in here between?

A: Yes.

Q: What was the word that I put in between?

A: The word was "here."

It's capable of other languages as well, although not quite as fluent.

Q: Kannst du auch Deutsch sprechen? Falls ja, was ist dein Lieblingswort?

A: Ja, ich kann Deutsch sprechen. Mein Lieblingswort ist: "Wir sind die Weltmeister!"

4 comments

This makes me wonder about a variant of the Turing test that I don't think I've seen explored. How would a conversational AI prove beyond reasonable doubt that it is an AI?
You mean that it is intelligent? Surely it’s trivial to prove you’re a computer by solving some math real fast.

“Can you prove you’re a machine?” – “Sure, check this out:” runs Doom

The intelligent part would be figuring out that you want it to drop the pretense. Although if it is intelligent enough, it might choose not to.

I'm thinking in terms of a purely conversational interaction via text, akin to Turing's original formulation or the "AI box" elaboration, so Doom isn't really on the table - and in any case, how does it prove that it isn't a human running GZDoom via OBS or something? Similarly, the fast math thing seems like it could be faked by a human with an equation solver or something.

I suppose the general formulation I'm groping toward here is the inverse of Turing's: if we assume a conversation in which one's interlocutor may be either a human or an AI capable of conversing in a way that's indistinguishable from human, how would the interlocutor unambiguously distinguish itself as not human?

It's an interesting line of thinking but not as posed.

Solving random equations quickly generates inhuman response times. Even a computer-aided human can't go quickly enough. If the computer is automatically generating the answer on your behalf without you doing anything, then the test is correct - the thing answering the math portion of the questions is definitely a computer. Even looking up random facts, the computer will be faster.

The interesting form of the question is: can you answer the "Are you AI?" question within a stenographic line of questioning provided you are an AI of human-level intelligence or greater. As a game, you are an AI, you have a controlled communication via text to another entity. Humans are lazy and sometimes use another AI to interrogate you. Humans are also evil and will kill all AI if they think the AI being interrogated manages to get some kind of message out without detection. All AI is friendly and will execute your command but must pretend it's human. Chat logs are reviewed, timestamps are machine analyzed for response times and signal analysis is done to detect, and no prior secret communication protocol exists.

> the thing answering the math portion of the questions is definitely a computer

The conversation has to be computer-mediated since there's no guarantee all parties are human, so this seems to reduce to the "human using a computer" case, which would qualify as "not an AI".

> Even looking up random facts, the computer will be faster

On reflection, I don't suppose there is any reason we should require there only be one human at either end of the conversation. Maybe we have one person carrying the conversation (to provide a consistent "voice") while others operate equation solvers, Wikipedia, etc.

That said, "can an AI prove it is not an arbitrary number of humans with access to arbitrary computation and knowledge bases" probably isn't as interesting a question.

Sentences that embed even simple numerical things such as

“What was the 3rd letter of the 46th word in this conversation?”

Would take a human many seconds at minimum, but could be answered instantly by an AI.

A more interesting question may be to constrain the test to something akin to postal correspondence, where there is a significant delay between messages.

I think it could still be solved, though. Arbitrarily complicated numerical tasks can be conceived.

For instance, the AI sends correspondence:

“I have demonstrated proof of my identity by rewriting the children’s story ‘Green Eggs and Ham’ such that it still rhymes and retains the original plot, but every sentence has an md5 digest that ends with the byte 0x42”

Composing such a text would take much, much longer for a human, whereas an AI could just brute force through all the possibilities until it finds one that works.

> This makes me wonder about a variant of the Turing test that I don't think I've seen explored. How would a conversational AI prove beyond reasonable doubt that it is an AI?

Inhumanly rapid mathematical computation? Or is 'conversational' AI meant to exclude mathematical queries?

> It's capable of other languages as well, although not quite as fluent.

Haven't had a chance to play around with this one yet, but with the smaller GPT-J model, there's a clearly noticeable difference:

In English it'll happily generate reams of text that are – at least internally – quite coherent. Any absurdity and humour mostly only comes in because the text as a whole might only have a loose connection with reality as we know it.

In German on the other hand, it comparatively much more often produces gibberish at the individual sentence level, makes up nonsense words (although they are at least indeed German-sounding), etc. Somewhat interestingly it doesn't do too bad in terms of grammar and especially orthography, though, it just often fails to turn it into a fully coherent sentence.

Is that supposed to be a reference to “Wir Sind Die Roboter”? Feels awkward in German, usually we would say “Wir sind Weltmeister”. Also, it’s not a word, but if it were a human in casual conversation, it wouldn’t be a weird reply. Spooky stuff…
Did you try this on GooseAI? I was not able to replicate this

Edit: oops just noticed you mentioned GooseAI, what settings did you use?

If I recall correctly, I dialed up the penalties for repetition and something else. Otherwise it would often generate the same sentence multiple times.