| I'm not sure I'm personally convinced LLMs are bad at arithmetic, I think they might just approach it differently to us. Something you'll find if you ever train a neural network to learn a mathematical function is that it will only ever approximate that function. It won't try to guess what the function is exactly like a human might do. For example consider, f(1) = 2, f(2) = 4, f(3) = 6, f(4) = 8, f(5) = 10. As a human you know how important precision is in maths and you know generally humans like round numbers so you naturally assume that, f(x) = x2 Neural networks don't have these biases by default. They'll look for a function that gets close enough maybe something like, f(x) = x1.993929910302942223 From a neural network's perspective the loss between this answer and the actual answer is almost so trivial that it's basically irrelevant. Then a human who likes round numbers comes along and asks the network, what's f(1,000)? To which the neural network replies, 19939.3 Then the human then goes away convinced the AI doesn't know maths, when in reality the AI basically does know maths, it just doesn't care as much about aromatic precession as the human does. Because again, to the AI 19939.3 is a perfectly acceptable answer. So now for fun let me ask ChatGPT some arithmetic questions... > ME > what's 2343423 + 9988733? > ChatGPT > The sum of 2343423 and 9988733 is 12392156. WRONG! It's actually 12332156. That's an entire digit out and almost 0.5% larger than the actual answer! > ME > what is 8379270 + 387299177? > ChatGPT > The sum of 8379270 and 387299177 is 395678447. Er, okay, that was right. Bad example, let me try again. > ME > what is 2233322223333 + 387299177? > ChatGPT > The sum of 2233322223333 and 387299177 is 2233322610510. WRONG! It's actually 2233709522510. That's 6 digits out and almost 0.02% smaller than the actual answer! If you take a more open minded view I think it's fair to say ChatGPT basically does know arithmetic, but its reward function probably didn't prioritise arithmetic precision in the same way a decade of schooling does for us humans. For ChatGPT having a few digits wrong in an arithmetic problem is probably less important that its reply containing that sum being slightly improperly worded. I guess what I'm saying is that I'm not sure I quite agree with the author that LLMs don't do arithmetic at all. It's not that they're trying to guess the next word without arithmetic, but more that they're not doing arithmetic the same as we humans do it. Which is may have been the point the author was making... I'm not really sure. |
They can write code to do math, but without code they can only estimate how likely a series of numbers are to be seen together.
They're very likely to get things like 2+2=4 correct because that's probably unique and common in their training data. They're unlikely to get two random numbers correct because it doesn't actually know what those numbers mean.