| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by throwaway8503 1152 days ago

This is what I got on ChatGPT today. I assume it used GPT4:

        Prompt            ChatGPT           Actual       Match
  397,356 * 930,547   369,685,207,932   369,758,433,732  FALSE
   36,330 *  26,951       979,458,630       979,129,830  FALSE
    8,681 *   9,330        80,911,430        80,993,730  FALSE
      278 *     903           250,734           251,034  FALSE
       82 *      77             6,314             6,314  TRUE

Edit: # of correct digits (counting from leftmost) only exceeds 3 on the smallest pair. It drops to two, as well, on the 3x3 set.

4 comments

ogogmad 1152 days ago

You probably used GPT-3.5. That said, I didn't manage to get GPT-4 to calculate 36,330 * 26,951 correctly. I suggested casting out 9s, casting out 11s, doing long multiplication, reversing the digits - nothing.

I have a theory that it does arithmetic badly because the logic goes right-to-left, when LLMs write left-to-right. If the digits were to be reversed, it might not make as many mistakes. I ran out of attempts before I could test this properly.

link

IX-103 1152 days ago

Did you adjust the prompt to ask it how a famous mathematician would answer the question? Or what a calculator would say the answer is?

Sometimes LLMs get math wrong because people got math wrong on the training data and so they match the error frequency (https://learnprompting.org/docs/basics/roles).

link

roflyear 1151 days ago

Yes that's exactly what people are saying here. It's not a criticism of the tool it's an example of what the tool is and how it functions.

link

newswasboring 1151 days ago

I just had the following conversation on chatGPT

> Human: what is 36,330 26,951 * (input is expressed in reverse post fix notation)

> chatGPT: To evaluate this expression using reverse Polish notation, we need to use a stack to keep track of the intermediate results. Here's how we can evaluate the expression:

1. Read the first number "36,330" and push it onto the stack. 2. Read the second number "26,951" and push it onto the stack. 3. Read the operator "", pop the top two numbers (26,951 and 36,330) from the stack, multiply them, and push the result (971,914,830) onto the stack. 4. The stack now contains only one number, which is the final result.

Therefore, 36,330 26,951 = 971,914,830 in reverse Polish notation.

I dont think this is about notation.

link

ogogmad 1151 days ago

You should ask it to show its working. The problem is that it uses a fixed amount of thinking for each character - and it writes left to right - while long multiplication is computed right to left. Past a certain number of digits, the amount of "thinking" needed to compute long multiplication in the wrong order (left to right) increases without bound -- accuracy suffers as a result.

A tangential point: Note that multiplication of decimal numbers _has_ to start with the rightmost digit to be done accurately. Consider x = 2/3 + 1/3 = 0.6(6) + 0.3(3), all digits repeating infinitely many times, so there's no rightmost digit: Notice that whichever digit you choose for x before the decimal point (either 0 or 1) could be wrong if you change any digit of 1/3. This is called the Table Maker's Dilemma.

The second paragraph is a tangent. My point was made in the first paragraph.

You said: "I dont think this is about notation" - Why did you ever think it was?

link

newswasboring 1150 days ago

I was thinking along the lines of maybe it reads the digits as individual tokens and reading from left to right it get confused. That's why I thought it might be about notation. I have learned it's definitely not the case (read a bit about what tokens actually mean)

I also tried the show your work methods. I will not paste my conversations here as they are so long but even with showing it's work it doesn't give the right answer. Two funny things I noticed

1. It either uses a completely wrong method to calculate it, or it shows the right "kind" of work and then gets the wrong answer

2. If I tell it it was wrong and it will just agree with me. I multiply two huge numbers and tell it no the answer is actually 42, it will just apologize. Then when I ask the reason why it thinks it's right it will give the most bullshit things lol. Once it even said "I read the number wrong, the second number should be x I read y". The thing is if I had actually given x in the input it would have gotten the answer right.

It's all very confusing.

link

jameshart 1152 days ago

That’s not ‘basic arithmetic’. Most humans would need a pen and paper to solve it.

GPT’s attention window is not equivalent to it being able to ‘write stuff down’ - it’s its equivalent of being able to remember the foregoing few minutes of conversation.

link

roflyear 1151 days ago

Stop redefining things to support your argument. "The basic operations under arithmetic are addition and subtraction, division and multiplication"

link

jameshart 1151 days ago

Right, but when you say an LLM can’t do basic arithmetic you’re presumably talking about mental arithmetic, right?

link

roflyear 1151 days ago

No.

link

optimalsolver 1152 days ago

Unless you're using Plus (black icon), you're using GPT-3.5 (green icon).

link

ravenstine 1151 days ago

Interesting. I didn't use as large of numbers as you did, but in the low thousands, and ChatGPT w/ GPT-4 got them correct 3 times in a row.

What's also interesting, though not necessarily surprising, is how relatively close the incorrect answers are in your examples. It's not like the model will randomly spit out an answer like 42 if you ask it to multiply something like 36,330 by 26,951.

link

MacsHeadroom 1151 days ago

> I assume it used GPT4

You did not. You have to explicitly select it from the dropdown which is only available on paid Plus accounts.

link