Hacker News new | ask | show | jobs
by kypro 1100 days ago
I'm not sure I'm personally convinced LLMs are bad at arithmetic, I think they might just approach it differently to us.

Something you'll find if you ever train a neural network to learn a mathematical function is that it will only ever approximate that function. It won't try to guess what the function is exactly like a human might do.

For example consider, f(1) = 2, f(2) = 4, f(3) = 6, f(4) = 8, f(5) = 10.

As a human you know how important precision is in maths and you know generally humans like round numbers so you naturally assume that, f(x) = x2

Neural networks don't have these biases by default. They'll look for a function that gets close enough maybe something like, f(x) = x1.993929910302942223

From a neural network's perspective the loss between this answer and the actual answer is almost so trivial that it's basically irrelevant.

Then a human who likes round numbers comes along and asks the network, what's f(1,000)? To which the neural network replies, 19939.3

Then the human then goes away convinced the AI doesn't know maths, when in reality the AI basically does know maths, it just doesn't care as much about aromatic precession as the human does. Because again, to the AI 19939.3 is a perfectly acceptable answer.

So now for fun let me ask ChatGPT some arithmetic questions...

> ME

> what's 2343423 + 9988733?

> ChatGPT

> The sum of 2343423 and 9988733 is 12392156.

WRONG! It's actually 12332156. That's an entire digit out and almost 0.5% larger than the actual answer!

> ME

> what is 8379270 + 387299177?

> ChatGPT

> The sum of 8379270 and 387299177 is 395678447.

Er, okay, that was right. Bad example, let me try again.

> ME

> what is 2233322223333 + 387299177?

> ChatGPT

> The sum of 2233322223333 and 387299177 is 2233322610510.

WRONG! It's actually 2233709522510. That's 6 digits out and almost 0.02% smaller than the actual answer!

If you take a more open minded view I think it's fair to say ChatGPT basically does know arithmetic, but its reward function probably didn't prioritise arithmetic precision in the same way a decade of schooling does for us humans. For ChatGPT having a few digits wrong in an arithmetic problem is probably less important that its reply containing that sum being slightly improperly worded.

I guess what I'm saying is that I'm not sure I quite agree with the author that LLMs don't do arithmetic at all. It's not that they're trying to guess the next word without arithmetic, but more that they're not doing arithmetic the same as we humans do it. Which is may have been the point the author was making... I'm not really sure.

2 comments

LLMs are bad at math because they don't actually understand the rules of math.

They can write code to do math, but without code they can only estimate how likely a series of numbers are to be seen together.

They're very likely to get things like 2+2=4 correct because that's probably unique and common in their training data. They're unlikely to get two random numbers correct because it doesn't actually know what those numbers mean.

What would an LLM have to do to convince you it was good at math? Check out this recent post by OpenAI where one of their models is solving 60%+ of problems from a high school math competition dataset: https://openai.com/research/improving-mathematical-reasoning...
It’s actually better at math than it is at arithmetic, and I think this discussion has been about arithmetic. I could make up something about how math is more like language than arithmetic is. I suspect the hypothesis that math tests tend to have a lot of stereotypical problem structures from a shared curriculum is also relevant. But who knows at this point?

Anyway, to convince me it’s good at arithmetic is not complicated…just be good at arithmetic! That is do it correctly, every time, for any size number.

>That is do it correctly, every time, for any size number.

Then no human is good at arithmetic.

I suspect most people on this forum can do arithmetic for any "reasonable" size number. It might take weeks to complete, but most people on this forum can calculate large numbers by hand.
Post moving. "Reasonable" is just an arbitrary line. Especially since most if not all would make some mistake somewhere along the line.

You can greatly increase GPT's arithmetic capabilities tackling it like a problem to solve "on paper" in context. And this was done on 3.5 not 4. https://arxiv.org/abs/2211.09066

If its going to take weeks, most people will get it wrong. That's a lot of calculations to never get wrong and never misinterpret some prior note you left
Okay, but we have since invented machines that can do arithmetic correctly, every time. When we try to do maths via an LLM, we're just throwing all of that away.
So ? I didn't tell you to use GPT-4 for arithmetic over a calculator. I simply pointed out that the only standard where GPT-4 is not good at arithmetic is a standard humans wouldn't fit the bill either. Especially since zero shot "mental" arithmetic is not even close to GPT-4 at its most accurate.
Fair enough, I’ll allow a 1% error rate per 10 addend digits.
ChatGPT, and probably GPT-4 too, is also hilariously bad at "more advanced" mathematics, including trying to come up with even slightly original proofs.
I think the statement that LLMs don't understand the rules of maths is far too strong. And this notion that LLMs are not able to answer a random arithmetic question "correctly" only holds if you assume "correctness" exists as binary and not a scalar.

I'd propose that your claim that LLMs don't understand at maths is very similar to the claim that Neuton didn't understand the Laws of Motion.

Yes – Neuton's laws are wrong, but they're also practically correct for 99.999% of applications. If correctness is viewed as a binary, Neuron is 100% wrong, but as a scalar Neuron is basically right.

Neural networks are inherently bad at finding exact rules, but they're excellent at approximating them to an accuracy that is acceptably good, this is bit that people miss when they say LLMs can't do maths.

When you claim they don't understand the rules of maths, I agree that they don't understand the explicit rules, but with the caveat that they probably understand something that allows them to approximate those rules "well enough".

This is why if you ask ChatGPT a question like 23435234 + 3243423 it's not going to say -33.1. It might not give the right answer, but it will almost always give you something that's close and very plausible. So while it might not understand the exact rules, it basically understands what happens when you add two numbers and 99% of the time will give you an answer that is basically correct.

The larger point I was trying to make here is that I think we humans are kinda biased when it comes to maths because we understand character precision which is the bias I think you're basing your reasoning on here. We humans believe precision is extremely important in the context of maths unlike other textual content. But an LLM isn't operating with that bias. It's just trying to approximate maths in a way that is correct enough in a similar way that it's trying to approximate the likely next character (or more correctly token) of other text content.

I don't think approximations are 100% wrong and perhaps us humans being bothered about LLMs giving answers to maths questions that are 0.1% wrong actually says more about our values and how we view maths than it says about an LLMs mathematical abilities.

You're just trying to redefine "mathematics" in order to be able to say that ChatGPT is good at it. But mathematics is about precision.
The exactness matters, though. Unless you'd like things like encryption to stop working.