| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LifeIsBio 1145 days ago

The game “20 questions” is probably the hardest I’ve seen chatGPT fail.

What’s interesting about the game is that, at first pass, there’s no ambiguity. All questions need to be answered with “Yes” or “No”. But many questions asked during the game actually have answers of “it depends”.

For example, I was thinking of “peanut butter” and chatGPT asked me “Does it fit in your hand?” as well as “Is it used in the kitchen?”. Given my answers, chatGPT spent the back half of its questions on different kitchen utensils. It never once considered backing up and verifying that there wasn’t some misunderstanding.

I played three games with it, and it made the same mistake each time.

Of course, playing the game via text loses a lot of information relative to playing IRL with your friends. In person, the answerer would pause, hum, and otherwise demonstrate that the question asked was ambiguous given the restrictions of the game.

Regardless, it was clear that chatGPT wasn’t accounting for ambiguity.

8 comments

DonaldPShimoda 1144 days ago

> It never once considered backing up and verifying that there wasn’t some misunderstanding.

Of course not; ChatGPT doesn't "consider". It doesn't think, it doesn't know. It can't identify that there was a misunderstanding of its own volition.

All ChatGPT does is use a (very sophisticated!) statistical analysis to generate text that conforms to an expectation of what a human response to a similar prompt might look like. It has been trained well in so far as it is able to produce prompts that seem like a human may have written them, but it doesn't reveal cognitive processes like "reconsidering" because it doesn't have any.

schrodingerscow 1144 days ago

Wow never heard this comment before

DonaldPShimoda 1144 days ago

Comments of that nature will continue so long as there are people who don't understand how language models work (or choose to misrepresent them).

tjr 1145 days ago

20-some years ago, I had this "20 questions" handheld electronic game that was eerily good at winning. I imagine it was a bunch of well-programmed tables of data, but in any case, it's certainly possible for a machine to do well at this game.

I think the more we see ChatGPT do things like "oh, I know this game -- I'm going to run a 20-year-old 20 Questions subroutine that is not part of my neural network language model to generate responses", it will become even more impressive.

helen___keller 1145 days ago

> I think the more we see ChatGPT do things like "oh, I know this game -- I'm going to run a 20-year-old 20 Questions subroutine that is not part of my neural network language model to generate responses", it will become even more impressive.

Agreed. Incidentally I’ve built a little toy version of a runtime for exactly this purpose - there’s a translation layer that’s given a bunch of available “APIs” (fed through the LLM context), and breaks down a high level goal into a structured series of API calls.

the runtime parses these API calls, and natively executes some (e.g. run a program, write to the file system) and others result in LLM invocations.

I’m sure OpenAI and crew are way ahead of me here, of course. I’m excited to see what the future holds in this field.

JohnFen 1144 days ago

The first AI-style program I ever wrote (about 25 years ago. Yes, I'm old) played 20 questions, but it would "learn" from prior games, so the more you played, the better it performed.

It got extremely good after a few hundred games.

smolder 1145 days ago

Yeah, ChatGPT could integrate Akinator[0] and trivially be great at the game. Without the help, though, It's a good, revealing benchmark for the LLMs ability.

[0] https://en.akinator.com

nr2x 1145 days ago

LLM for the foreseeable future function most reliably as a user interface layer for other system. I use GPT to “translate” natural language down into the API calls that get real data and it works great. I’d never trust it beyond that.

marcosdumay 1144 days ago

You trained it with "this phrase means this command" examples? How do you make it use your custom API? (Or you are not using your custom API?)

nr2x 1144 days ago

Basically yeah, just a pretty detailed set of prompts and then “turn the next message into an api call” and it basically works perfectly.

When I first heard the term “prompt engineer” I rolled my eyes, but now that I’ve gotten into it I see it’s really an art form.

rjbwork 1145 days ago

"Green Glass Door" also completely stumped it. It just could not deduce that the trick was semantic at the word representation level, rather than something related to the object that the word describes.

What's funny about 20 questions is that Akinator has been absolutely slaying it for like 20 years now.

ryukafalz 1145 days ago

What happens if you answer with something approximating the hemming and hawing rather than a straight yes or no? You can encode that into text, it's just less common outside of very informal chat conversations.

6gvONxR4sf7o 1144 days ago

I just did a 20-questions with it, and was surprised by how bad gpt4 did. Then for fun, I turned it around and had me be the guesser. It's weird and surreal to play 20-questions when you know that the clue-giver doesn't have an answer in their mind (or more literally, there isn't a single answer in any stateful form while you play), but is instead just eventually saying "yes that's what I was thinking of" when it's statistically appropriate.

numtel 1143 days ago

With the code execution plugin, one could theoretically ask chatgpt to generate a salted hash of their answer at the start that's revealed at the end to prove it was correct.

Without any plugins, chatgpt will happily return sha hashes and salts when I asked it to play rock paper scissors this was. The only trouble was, the hashes were totally wrong.

stainablesteel 1145 days ago

i love your example, i wonder if this kind of game can be implemented in future training scenarios

we as humans understand ambiguity so much easier because we learn to speak and interact before we write, and writing ambiguity is way less obvious if you've never experienced it

eternalban 1145 days ago

I'm not sure I would think "food" when someone says they "use [it] in the kitchen". You "use" food? (Used in cooking != used in kitchen, imo)

JohnFen 1144 days ago

I use food (including peanut butter) in cooking. I cook in the kitchen. Therefore peanut butter is a thing I use in the kitchen. Seems correct and proper to me.

The ambiguity as I see it is that the kitchen isn't the only place I use peanut butter. I've eaten it (which I think counts as "using") in other rooms. I've even made peanut-butter sandwiches (properly "using" it) in the living room before.

version_five 1145 days ago

That's his whole point. It's possible to consider it technically correct, but it's a red herring.

eternalban 1145 days ago

Well, the alleged point is challenged. If playing this game, the questioner must constantly verify that the other party is using the language properly, you'll exhaust that 20 q limit rather quickly.

- is it used in the kitchen?

- yes.

- [well, kitchen appliances, here we go ..] is it ..?

...

- [aha. meat intelligence no speak proper English?] Is this thing you use in kitchen edible?

- Oh, yeah.

- [oh dear. we can not let meat machines govern this planet...]

smolder 1145 days ago

I use peanut butter as an ingredient for sandwiches, usually in my kitchen.

eternalban 1145 days ago

Yes. You use edible things in preparing or cooking food (which may happen in the kitchen). 'Use' maps to food prep (the act) but never to prep location. Only in cases where the thing has both general edible and food preparation usage -- "I use honey extensively in the kitchen" for example -- does "use" and "edible" make sense.

yorwba 1145 days ago

But peanut butter has general edible and food preparation usage quite similar to honey, doesn't it? You can spread it on a slice of bread to eat directly or use it as a baking ingredient, but you probably wouldn't eat it by the spoonful straight from the container. (Or maybe that's how people usually eat peanut butter, I kind of don't want to know.)

eternalban 1145 days ago

guilty as charged: spoon + jar = happy mouth.

DreamyCrab 1145 days ago

Yes, I do.

idiocrat 1145 days ago

"He saw that gas can explode."

This ambiguous sentence stuck in my head some 30 years ago, when the AI was popular at that time.

There was a research paper discussing the issue of ambiguity.

DougMerritt 1145 days ago

Right -- although many things that are ambiguous in text are disambiguated in actual speech, so the problems that arise with audio speech are not wholly the same as with text.

A classic example is the word "record", which has first syllable stress as a noun, but second syllable stress as a verb. "I bought a RECord" vs "Please reCORD the music".

(in the dominant American dialect; I don't recall about other dialects/countries)

idiocrat 1145 days ago

An interesting reprint in 2003

https://www.drdobbs.com/parallel/understanding-natural-langu...

"Computers still cannot understand natural language as well as young children can. Why is it so hard?"

Source: AI Expert, May 1987