Hacker News new | ask | show | jobs
by gzer0 945 days ago
Well, in any case, I conducted an experiment to test GPT-4's logical reasoning skills.

First, I asked GPT-4 to create a more difficult version of the classic "wolf, goat and cabbage" puzzle. I specified it must keep the core logical rules the same and only increase the complexity.

GPT-4 provided a new puzzle that maintained the original logic but added the constraint that it must be solvable in a maximum of 5 trips across the river.

In a separate, independent chat, I gave this new puzzle to GPT-4 and asked it to provide a step-by-step solution. It output an answer.

Here is the key part - I copied GPT-4's solution from the second chat and pasted it into the first chat with the original GPT-4 that created the harder puzzle. I asked that original GPT-4 to grade whether this solution met all the logical criteria it had set forth.

Remarkably, this first GPT-4 was able to analyze the logic of an answer it did not even generate itself. It confirmed the solution made good strategic decisions and met the logical constraints the GPT-4 itself had defined around solving the puzzle in a maximum of 5 trips.

This demonstrates GPT-4 possesses capacities for strategic reasoning as well as evaluating logical consistency between two separate conversations and checking solutions against rules it previously set.

https://chat.openai.com/share/996583dd-962b-42a8-b4b9-e29c59...

2 comments

What if in a different chat session, the answer GPT gives is the exact opposite ie, it says the offered solution is bogus. Would you even know of it unless someone tries it and shows it to be so? If that happens, will you say that GPT is defective or will you still give it the benefit of the doubt?

Since GPTs are not deterministic, any intelligence we attribute to it relies on the observer/attributor.

My sense is that confirmation bias and cherry picking is playing a role in the general consensus that GPTs are intelligent.

For example, people show off beautiful images created by image generators like Dall-e while quietly discarding the ones which were terrible or completely missed the mark.

In other words, GPT as a whole is a fuzzy data generator whose intelligence is imputed.

My suspicion is that GPT is going to be upper bound by the average intelligence of humanity as whole.

This is not evidence of strategic reasoning.

You are assuming human style thinking and object modeling is going on. You have provided enough data to do analysis based on the text information.

Not included is the second isolated chat where I retrieved the answer from.