| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by godelski 784 days ago

Take out this

  Instructions: 
  1. Do not include any assumptions that I have not mentioned here. 
  2. Before solving the problem, state the goal of the problem. 
  3. After each step of your reasoning, state where the man and the goat are now standing, and state if the goal has been achieved or not, if the goal has been achieved then stop. If the goal has not been achieved then explain why not.

Then tell me what happens

Spoilage is incredibly easy to do. It is about information leakage and you have to think very carefully about how information can leak through in subtle ways. Specifically #1 and #2 are strong hints that there is a trick to the problem (i.e. is this something you would use in a generic prompt?). #3 is a reiteration of the problem, that gives extra weight. You can decrease the weight by restating as "state where the man and any animals are located" (notice there's lower information gain here). " if the goal has been achieved then stop." is a big hint. To reason, it should know when to stop.

I posted some recent river crossing tweets in this comment that may be of interest to you https://news.ycombinator.com/item?id=40231409

1 comments

xcv123 784 days ago

Yes we know the current LLMs cannot solve the original prompt. That's why I experimented with different prompts.

The instructions are prompting it to proceed rigorously, as it is a logical problem, not a natural language problem. These models are primarily trained for solving natural language processing tasks, and so they are predisposed to answer in a certain way through training and tuning. The models produce less verbose output by default to reduce cost (each token costs money). Telling the model to generate more tokens in step-by-step reasoning enables it to "think" further as it can only "think" when generating each token.

OpenAI could train or tune ChatGPT to "spoil" itself by default when answering any problem that it identifies as a logic problem. It is somewhat arbitrary.

link

godelski 784 days ago

> The instructions are prompting it to proceed rigorously, as it is a logical problem, not a natural language problem.

I think you're missing a bit here. Look at the middle tweet where the person constructed it fail the logic. There are no tricks. What you're missing is the signal you're giving it, how it is spoiling the question in a subtle way. That's very different that a reasoning machine. We can't trust it to reason if it can only "reason" when we give it explicit instructions to do so that do not generalize for many tasks. That's not really reasoning...

> OpenAI could train or tune ChatGPT to "spoil" itself by default

They have and it's provable

link

xcv123 784 days ago

> That's not really reasoning

They are trained a certain way to perform specific types of tasks, primarily natural language processing tasks. They have necessarily learned some methods of reasoning in order to do what they were trained to do. No one is pretending that these are symbolic logic mechanical theorem provers. They are tuned a certain way to respond in a specific manner and they only do what they are told. If you want it to use reasoning then you need to tell it to use reasoning. It's a chat bot running on a neural network and it is not self aware.

Hopefully the next generation of AI will be more reasonable. We are not there yet.

link