Hacker News new | ask | show | jobs
by sparsevector 1185 days ago
I also tried this out and the balloon example as well.

User: Barbara, Mary, and Harry had red balloons. Ted had a blue balloon. Sam also had a red balloon. How many children have red balloons?

ChatGPT (GPT-4): Barbara, Mary, Harry, and Sam each had a red balloon. That means there are 4 children with red balloons.

User: What about Ted?

ChatGPT (GPT-4): Ted had a blue balloon, not a red one. So, he is not counted among the children with red balloons. Therefore, the total number of children with red balloons remains 4.

User: Why do you think I asked my original question?

ChatGPT (GPT-4): It seems that you asked your original question to test my ability to understand and process information provided in a text. You wanted to see if I could correctly identify and count the number of children with red balloons based on the details given.

User: Can you explain what mistakes a simpler model may have made on this question?

ChatGPT (GPT-4): A simpler language model may have made the following mistakes when answering your original question: ...

I like trying these sort of puzzles and "gotcha" prompts and I've found GPT-3.5 and lower versions are indeed not hard to confuse, but it's much harder to find simple examples that confuse GPT-4. Often times GPT-4 will even be able to explain the puzzle and why it's interesting. I think this is part of why GPT-4 feels qualitatively different. I understand (roughly) how transformers work, but GPT-4 does seem to have some emergent ability to reason, at least in the sense of making short chains of logical inferences, and this does seem to inform text completions not just on these sort of puzzles. That said I do think there's real danger of overconfidence and over reliance.

2 comments

I thought the gotcha was that nowhere was it stated which of them are actually children.
How it just takes these implicit assumptions in stride makes me worry that in the future these chatbots will be fine with assumptions and me - human - has a too square brain to go on without definitions. :)
Agreed GPT4 is better, but there's still simple world modeling issues:

> I have a chair in front of the TV and one in front of the computer. I put a watch under the chair in front of the TV. I swap the chairs. Where is the watch?

> After swapping the chairs, the watch is now under the chair that is in front of the computer.

Interestingly, GPT3 got it correct.

I asked GPT 4 (Bing) your question, but added "(Please give a plausible explanation for why this question could be answered either way.)"

> This question could be answered either way depending on how you interpret the phrase “under the chair”. If you mean under the chair as in attached to the bottom of the chair, then the watch would move with the chair when you swap them. In that case, the watch would be in front of the computer. However, if you mean under the chair as in on the floor below the chair, then the watch would stay on the same spot when you swap the chairs. In that case, the watch would be in front of the TV. So, it depends on what you mean by “under”.

Yes, but that's a huge hint. GPT4 naively doesn't understand this.

Question:

> I have a chair in front of the TV and one in front of the computer. I put a watch under the chair in front of the TV. I swap the chairs. Where is the watch?

> Work out step by step. Is this question ambiguous?

Answer:

This question is not ambiguous, but let's break it down step by step to understand the situation and find the location of the watch.

You have a chair in front of the TV (Chair A) and one in front of the computer (Chair B). You put a watch under Chair A (the one in front of the TV). You swap the chairs, meaning Chair A is now in front of the computer and Chair B is in front of the TV. After swapping the chairs, the watch remains under Chair A. Since Chair A is now in front of the computer, the watch is under the chair in front of the computer.

Ya gpt-4 is still confused: (it's interesting cause it does realize the watch doesn't move but still gets it wrong)

Model: GPT-4

I have a chair in front of the TV and one in front of the computer. I put a watch under the chair in front of the TV. I swap the chairs. Where is the watch?

Work out step by step. Also define "under the chair". Is the question ambiguous?

Step 1: A chair with a watch under it is in front of the TV. Step 2: Another chair is in front of the computer. Step 3: The chairs are swapped.

To clarify the term "under the chair," let's define it as a watch placed beneath the chair, such that the chair is directly above the watch.

Now let's analyze the situation after swapping the chairs.

After Step 3:

The chair that was initially in front of the TV (with the watch under it) is now in front of the computer. The chair that was initially in front of the computer is now in front of the TV. As the question is about the location of the watch, it is important to note that the watch was not mentioned to be moved along with the chair. Therefore, the watch remains in its original position, which is under the chair that was initially in front of the TV.

The question could be considered ambiguous if one assumes that the watch was also moved along with the chair. However, since the watch's movement was not explicitly mentioned, it is reasonable to assume that the watch remained in its original position during the chair swap.

So, the watch is still under the chair that was initially in front of the TV, which is now in front of the computer.

If I ask it like this, if gets it right:

I have a chair in front of the TV and one in front of the computer. I placed a watch on the floor beneath the chair in front of the TV. Then, I swapped the positions of the chairs. Where is the watch now?

After swapping the positions of the chairs, the watch is now on the floor beneath the chair that was initially in front of the computer, since that chair is now in front of the TV.

This is not a competition in getting it to answer right, though. It’s that the instances where it answers wrong demonstrate its lack of inner mental model of what it is supposedly reasoning about, as well as a lack of meta-awareness. I think we tend to underestimate what mere linguistic correlation is capable of producing, and are too quick to attribute intelligent reasoning and an inner mental model to it.
Question isn't clear if the watch is stuck to the bottom of the chair.
I just tried it and it says it’s under the chair in front of the computer. I used the Bing model.