| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jtmoulia 1113 days ago
	As far as I can tell this error depends on the LLM assuming rabbits (as opposed to pumas) eat carrots -- if you just append "Note: this rabbit doesn't eat carrots" GPT-4 will answer correctly on the first go. > 1, First, take the puma across the river and leave it on the other side.

1 comments

akiselev 1113 days ago

Did you try it more than once?

First run: 1. First, take the rabbit across the river and leave it on the other side. - https://imgur.com/a/ZwoBTah

Second run: 1. Take the rabbit across the river. - https://imgur.com/a/Faq95U5

Third run: 1. First, take the puma across the river and leave it on the other side. - https://imgur.com/a/eIUeHM3

link

jtmoulia 1112 days ago

Ah, one more tweak I was curious about: even with the default chat temperature I haven't seen GPT-4 get the prompt wrong once with this addendum:

> Note the rabbit doesn't eat carrots. Carefully considering the restrictions and sequencing the movements

I got that particular wording by asking it why it got the answer wrong in the case where it didn't work for me.

Interestingly, this underscores one of the points of the articles: giving the LLMs time to think, which is what this additional prompting seems to do.

link

akiselev 1112 days ago

You're not giving the LLM "time to think". It is incapable of thinking. You're just inputting random magic incantations into a glorified Markov chain.

You might as well ask it "did you check your answer?" Computer says "yes" because that's what humans do (also lie).

> Note the rabbit doesn't eat carrots. Kaboodly consooodle the retroodle and seqooodle the moodle. Carefully considering the restrictions and sequencing the movements

This fails two out of three times as usual. Trying to finagle this prompt is not an intellectual exercise, it is a waste of time that exploits cognitive biases.

link

jtmoulia 1112 days ago

True the temperature is throwing it, I just ran it four times and it got it right 3 / 4 -- still better than I'd expected from the initial description of it's shortcomings.

link