| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by richardw 784 days ago

I know that's the messaging, but the real link to reality is very tenuous. This was a great example from the last couple days:

https://twitter.com/colin_fraser/status/1785132544482226679

I just tried a similar question now with ChatGPT4:

"If a man and a goat are on one side of a river, what is the minimum amount of trips required to get the man and goat to the other side in a boat. Assume the boat can hold at most one animal and one human."

ChatGPT: 3 trips

That is very much closer to "trying to predict next word from examples" than "billion-dollar model with internal reasoning".

1 comments

jack_pp 784 days ago

That sounds surprisingly close to how a toddler might reason, only difference is the toddler can eventually see the flaw in their reasoning if you press them long enough while the LLM doesn't have the architecture for learning in real time yet

link

richardw 784 days ago

I was repeatedly amazed at how smart my toddler was. You just feel the general intelligence.

She's a bit older now (5) but e.g. a few days ago I was talking about cleaning the whole house. She said "you didn't clean the WHOLE house, look there's something you didn't clean".

link

xcv123 784 days ago

The LLM does figure it out if you ask further questions in the same chat. Here's GPT-3.5 https://chat.openai.com/share/a8669390-8eb0-46c2-b804-3aafc3...

link

godelski 784 days ago

If you spoil it with your followup questions... which doesn't help because the point of these is that they're controlled experiments where you do know what the right answer and logic is. You can't test when you don't.

link

xcv123 784 days ago

It's not spoiling anything. It's just an observation of the limits of current LLMs.

I tried a few chain of thought prompts for the original question and GPT-3.5 was sometimes (randomly) able to find the correct answer on the first attempt for this one

https://chat.openai.com/share/c144ba23-2f78-4cc8-a1c5-ca3106...

link

godelski 784 days ago

Take out this

  Instructions: 
  1. Do not include any assumptions that I have not mentioned here. 
  2. Before solving the problem, state the goal of the problem. 
  3. After each step of your reasoning, state where the man and the goat are now standing, and state if the goal has been achieved or not, if the goal has been achieved then stop. If the goal has not been achieved then explain why not.

Then tell me what happens

Spoilage is incredibly easy to do. It is about information leakage and you have to think very carefully about how information can leak through in subtle ways. Specifically #1 and #2 are strong hints that there is a trick to the problem (i.e. is this something you would use in a generic prompt?). #3 is a reiteration of the problem, that gives extra weight. You can decrease the weight by restating as "state where the man and any animals are located" (notice there's lower information gain here). " if the goal has been achieved then stop." is a big hint. To reason, it should know when to stop.

I posted some recent river crossing tweets in this comment that may be of interest to you https://news.ycombinator.com/item?id=40231409

link

xcv123 784 days ago

Yes we know the current LLMs cannot solve the original prompt. That's why I experimented with different prompts.

The instructions are prompting it to proceed rigorously, as it is a logical problem, not a natural language problem. These models are primarily trained for solving natural language processing tasks, and so they are predisposed to answer in a certain way through training and tuning. The models produce less verbose output by default to reduce cost (each token costs money). Telling the model to generate more tokens in step-by-step reasoning enables it to "think" further as it can only "think" when generating each token.

OpenAI could train or tune ChatGPT to "spoil" itself by default when answering any problem that it identifies as a logic problem. It is somewhat arbitrary.

link

richardw 784 days ago

I tried "are you sure", which often triggers some reasoning, and it was pretty confident. I'm trying not to give it the answer, but run it as if I didn't have any special knowledge. GPT + human > GPT. I mean, we're treating these things like another kind of intelligence, not a hammer.

GPT4: https://chat.openai.com/share/1beb5391-d321-4515-945e-38233f...

link

sam0x17 784 days ago

Another really oddly effective one is offering it a $500 tip

link