Hacker News new | ask | show | jobs
by ravenstine 1154 days ago
Dunno if there's some trickery going on under the hood, but GPT-4 does basic arithmetic correctly.
2 comments

It statistically outputs the next probable value of text. There is a lot of math related text in the CommonCrawl (likely the biggest source of it's training). That is all the company who owns it claims that it does. There is no calculation, therefor no basic arithmetic being done correctly.
If it's only statistics, what is the neural network for?
Implement the probabilistic model of token sequences and fit it to the training data?

You could say the same for a character recognition system.

emergent behavior does happen with neural networks. It is correct to say that GPT 4 doesn't do math correctly, but it may be incorrect to say a large language model can't in principle.
Is there any evidence it can? It's not incorrect to say there is no God for example
It's extremely easy to test arithmetic on random numbers it'll never have seen in the training set. GPT-4 can do arithmetic just fine.
And it will give you some random number with maybe the right order of magnitude. Doesn't seem to do arithmetic very fine at all actually.
Yeah that's not true lol. It's arithmetic is not perfect (mostly multiplication, addition is fine) but there's nothing random about even the wrong/non-exact numbers
we're talking about an llm dude

how am i supposed to have a conversation about someone who is gassing up "not perfect [arithmetic]" (something a wrist watch from the 80's can do) and won't even believe what the creators of said machine say about how it works

This is what I got on ChatGPT today. I assume it used GPT4:

        Prompt            ChatGPT           Actual       Match
  397,356 * 930,547   369,685,207,932   369,758,433,732  FALSE
   36,330 *  26,951       979,458,630       979,129,830  FALSE
    8,681 *   9,330        80,911,430        80,993,730  FALSE
      278 *     903           250,734           251,034  FALSE
       82 *      77             6,314             6,314  TRUE

Edit: # of correct digits (counting from leftmost) only exceeds 3 on the smallest pair. It drops to two, as well, on the 3x3 set.
You probably used GPT-3.5. That said, I didn't manage to get GPT-4 to calculate 36,330 * 26,951 correctly. I suggested casting out 9s, casting out 11s, doing long multiplication, reversing the digits - nothing.

I have a theory that it does arithmetic badly because the logic goes right-to-left, when LLMs write left-to-right. If the digits were to be reversed, it might not make as many mistakes. I ran out of attempts before I could test this properly.

Did you adjust the prompt to ask it how a famous mathematician would answer the question? Or what a calculator would say the answer is?

Sometimes LLMs get math wrong because people got math wrong on the training data and so they match the error frequency (https://learnprompting.org/docs/basics/roles).

Yes that's exactly what people are saying here. It's not a criticism of the tool it's an example of what the tool is and how it functions.
I just had the following conversation on chatGPT

> Human: what is 36,330 26,951 * (input is expressed in reverse post fix notation)

> chatGPT: To evaluate this expression using reverse Polish notation, we need to use a stack to keep track of the intermediate results. Here's how we can evaluate the expression:

1. Read the first number "36,330" and push it onto the stack. 2. Read the second number "26,951" and push it onto the stack. 3. Read the operator "", pop the top two numbers (26,951 and 36,330) from the stack, multiply them, and push the result (971,914,830) onto the stack. 4. The stack now contains only one number, which is the final result.

Therefore, 36,330 26,951 = 971,914,830 in reverse Polish notation.

I dont think this is about notation.

You should ask it to show its working. The problem is that it uses a fixed amount of thinking for each character - and it writes left to right - while long multiplication is computed right to left. Past a certain number of digits, the amount of "thinking" needed to compute long multiplication in the wrong order (left to right) increases without bound -- accuracy suffers as a result.

A tangential point: Note that multiplication of decimal numbers _has_ to start with the rightmost digit to be done accurately. Consider x = 2/3 + 1/3 = 0.6(6) + 0.3(3), all digits repeating infinitely many times, so there's no rightmost digit: Notice that whichever digit you choose for x before the decimal point (either 0 or 1) could be wrong if you change any digit of 1/3. This is called the Table Maker's Dilemma.

The second paragraph is a tangent. My point was made in the first paragraph.

You said: "I dont think this is about notation" - Why did you ever think it was?

I was thinking along the lines of maybe it reads the digits as individual tokens and reading from left to right it get confused. That's why I thought it might be about notation. I have learned it's definitely not the case (read a bit about what tokens actually mean)

I also tried the show your work methods. I will not paste my conversations here as they are so long but even with showing it's work it doesn't give the right answer. Two funny things I noticed

1. It either uses a completely wrong method to calculate it, or it shows the right "kind" of work and then gets the wrong answer

2. If I tell it it was wrong and it will just agree with me. I multiply two huge numbers and tell it no the answer is actually 42, it will just apologize. Then when I ask the reason why it thinks it's right it will give the most bullshit things lol. Once it even said "I read the number wrong, the second number should be x I read y". The thing is if I had actually given x in the input it would have gotten the answer right.

It's all very confusing.

That’s not ‘basic arithmetic’. Most humans would need a pen and paper to solve it.

GPT’s attention window is not equivalent to it being able to ‘write stuff down’ - it’s its equivalent of being able to remember the foregoing few minutes of conversation.

Stop redefining things to support your argument. "The basic operations under arithmetic are addition and subtraction, division and multiplication"
Right, but when you say an LLM can’t do basic arithmetic you’re presumably talking about mental arithmetic, right?
No.
Unless you're using Plus (black icon), you're using GPT-3.5 (green icon).
Interesting. I didn't use as large of numbers as you did, but in the low thousands, and ChatGPT w/ GPT-4 got them correct 3 times in a row.

What's also interesting, though not necessarily surprising, is how relatively close the incorrect answers are in your examples. It's not like the model will randomly spit out an answer like 42 if you ask it to multiply something like 36,330 by 26,951.

> I assume it used GPT4

You did not. You have to explicitly select it from the dropdown which is only available on paid Plus accounts.