| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Workaccount2 815 days ago

Here is a prompt you can plug into GPT4:

"I have a problem for you to solve. Muffins sell for $3/each. rick bakes 30 muffins a day. Tom bakes 2 muffins monday, 4 tuesday, 6 wednsdays, up to 14 on sunday. On days which tom and jerry combined bake more than 41 muffins, the price of the muffins drops to $2.50. How much total revenue do rick and tom take in during a full week, combined."

Please tell me how ChaptGPT4 writing a script to solve that is not logical reasoning, while a human pulling out pen and paper to do it is...

1 comments

Jensson 815 days ago

> Please tell me how ChaptGPT4 writing a script to solve that is not logical reasoning, while a human pulling out pen and paper to do it is...

I changed the prompt a bit (made all the numbers 3-4 digits) and gpt-4 answered with this, it just made up numbers for the days that you didn't add numbers for so it failed before it even came to arithmetics. Here is what it said, after I said this about tom "Tom bakes 2911 muffins monday, 491 tuesday, 699 wednsdays, up to 149 on sunday.", it just assumed sundays number was for all other weekdays not given a human wouldn't do that, and it missed the "up to" statement. Maye the large numbers I gave threw it off, but if that is enough to throw it of just shows that it can't really reason.

So thanks for that, more evidence these models are bad at reasoning.

Here is the first part of what it responded with, it is wrong already here:

   First, let's calculate the number of muffins baked by Tom during the week:

   Monday: 2911
   Tuesday: 491
   Wednesday: 699
   Thursday: 149
   Friday: 149
   Saturday: 149
   Sunday: 149

Edit: Here it made an arithmetics error just below, the error is that 4062 is not greater than 4199, so two critical errors, I taught math at college for years and you wouldn't find many students making mistakes like this:

   Let's determine the days when Tom and Rick combined bake more than 4199 muffins:

   Monday: 2911 (Tom) + 3571 (Rick) = 6482
   Tuesday: 491 (Tom) + 3571 (Rick) = 4062
   Wednesday: 699 (Tom) + 3571 (Rick) = 4270

   On Monday, Tuesday, and Wednesday, they bake more than 4199 muffins combined, so the price of the muffins drops to $2851.50 on those days.

link

Workaccount2 815 days ago

Just so we have this straight, you completely changed the nature of the problem (by turning a perfect information problem into an imperfect information problem) and then are looking at me with a straight face to make your point? Please...

Unless of course you didn't realize that tom has a pattern to his baking, at which point to irony becomes palpable.

And on top of that, I am willing to bet if you give me your prompt, I would be able to restructure it in such a way that GPT4 would be able to answer it correctly. More often than not, people are just really bad at properly asking it questions.

link

Jensson 815 days ago

> Just so we have this straight, you completely changed the nature of the problem (by turning a perfect information problem into an imperfect information problem) and then are looking at me with a straight face to make your point? Please...

I used your exact quote and just changed the numbers, it is still a perfect information problem.

Or, ah right you mean you gave me an imperfect information problem since you assumed the reader would guess those values. Yeah, I read it as a perfect information problem where all values were given, and then you would give the income as a range of possible income values based on how many muffins were baked on Sunday. None of the LLMs I sent it to managed to solve it entirely, it is a pretty easy problem.

Reasonable way to parse your sentence is:

   Monday: 2, Tuesday: 4, Wednesday: 6, Sunday: 0-14, rest: doesn't work so 0

> Unless of course you didn't realize that tom has a pattern to his baking, at which point to irony becomes palpable.

If you didn't say he baked on those days then he didn't bake on those days. The specification is clear. If I say "I will bake 2 muffins on Tuesday and 6 muffins on Sunday" the reasonable interpretation is that I wont bake anything the rest of the days. Why would you assume he baked anything at all those days?

Or if I say "Emily will work Mondays and Thursdays", do you just guess the rest of the days she will work? No, you assume she just works those days.

Is that a standard problem you wrote from memory? Not sure why you would assume there were muffins baked in the days you didn't list.

For example, if I say Tom bakes up to 14 muffins on Sunday, then the reasonable interpretation is that Tom will bake 0-14 muffins on Sunday. Maybe you should write the prompt clearer if you mean something else? Because as written anyone would assume that he didn't bake the other day, and on sundays he baked up to 14 muffins.

Anyway, it failed even with your "up to" interpretation meaning the reader should fill in the values, it still made that math error. But it using your "up to" interpretation there is a huge red flag, since in a real environment nobody would give that kind of information as a riddle with hidden values, you would specify all the values for each day each person worked and the rest you assume the person just isn't working and baked 0 muffins. If the LLM starts to guess values for some patterns and words where it doesn't make sense then it is really unreliable.

link

Workaccount2 815 days ago

I can see why some humans would struggle with the phrasing.

Thankfully GPT4 has strong reasoning skills and knew exactly what I meant.

https://chat.openai.com/c/b0ed06f1-c0d3-46a6-b07c-289b328417...

I encourage you to see the chat yourself, and would love to here how it's not reasoning.

Edit: Fixed Link: https://chat.openai.com/share/991ca8af-f735-436f-bfc2-5df929...

you can click the [>_] at the end for the code generated.

Seem to have hit reply cut off

link

Jensson 815 days ago

I just get this from your link

   Unable to load conversation b0ed06f1-c0d3-46a6-b07c-289b328417bb

link

magicalhippo 815 days ago

> For example, if I say Tom bakes up to 14 muffins on Sunday, then the reasonable interpretation is that Tom will bake 0-14 muffins on Sunday.

I don't have a stake in this muffin game, but that's indeed how I interpreted the instructions when reading them.

Had it said "and so on up tp 14 on Sunday" I would assume he baked each day.

link