Hacker News new | ask | show | jobs
by d--b 1596 days ago
What? You think this is poor performance?

This totally blows my mind. I would never have guessed that GPT could get ANY of these right.

I mean, is there a data point in the dataset used to train where you can read 2241 + 19873 = 22114? Quite unlikely...

And those multiplications. It's consistently getting the number of digits right and the first two numbers correct. How the hell does this happen?

Sure, it's sometimes way off. But generally it is in the right ballpark.

I certainly think people should look into what's happening inside the model.

7 comments

Regardless of the base, be it text or encoded numbers, text, as encoded is just a different base of representing a number('325' is '3' * (2^32) + '2' * (2^16) + '5' * 2^8 = 51 * 2^32 + 50 * 2^16 + 53 * 2^8). Neural networks can approximate polynomials, and additions/substractions & multiplications can be approximated - in this case, the base is just not '10' but 'ASCII'. I think that if you tried to train it only with arithmetic expressions (in text) it should get even better, it should just approximate the underlying operation - it does not need to understand the text.
Maybe this is an example of where you need an "extra specialized skill"(arithmetic) vs the general and semi-ambiguous-skill of language+conversation.

GPT-3 is "good with conversation (language)"

GPT-3 now needs a "sub-nn-model" to do the very 'specialized skill called math'

*GPT-3 Should 'learn' to recognize which questions should be delicate to a submodel.

I think this is idea of Google Pathways (Multitude of Expert model). I mean it already works like that in every model but I think they train it differently to have it more separated.
What fascinates me most is that the errors are very "human-like". If you gave me multi-digit multiplication and addition problems like that, I would frequently have similar results of getting most digits right but making a mistake on one or a few of them.
>Sure, it's sometimes way off. But generally it is in the right ballpark.

which is worse than being completely off. it just showcases how the model works, by treating mathematics like language. There are lots of examples in the dataset so similar sounding inputs produce similar sounding outputs.

This is akin to sitting in a foreign language lecture where you don't understand a single word being spoken and you try to answer questions by making similar sounding noises. While you may give an answer that sounds better than random in reality you haven't learned anything.

If these models understood mathematical laws what they would produce is arithmetic errors, like giving an answer with a wrong sign, not jumbling numbers.

>I mean, is there a data point in the dataset used to train where you can read 2241 + 19873 = 22114? Quite unlikely...

But there might be something like xxx1 + xxxx3 = xxxx4 in the dataset so it can learn the pattern.

That's the astonishing bit
It really isn’t. You see a lot of things when reading 500 billion tokens
Yeah, I'm *totally* unimpressed.

I, for one, learn all my math without ever seeing any math or logic examples at all.

"Teacher, what is this '34+12' stuff - I've already developed a complete grand unification theory on my own - I don't need examples of what you call 'addition'" - apparently everyone unimpressed by nlp today

they didn't mean that it was astounding that something of the form "xxx1 + xxxx3 = xxxx4" was in the training set, but that it managed to "learn the pattern".
They should have asked questions like:

What is twothousandfortyone plus nineteenthousandeighthundredseventythree?

You might like the paper "Do NLP Models Know Numbers? Probing Numeracy in Embeddings": https://arxiv.org/abs/1909.07940

Neural network models seem to encode an approximate notion of quantity in their representations. This paper is pre-GPT-3, but I would think the larger training set and larger model capacity would help the model learn quantity more easily.

It's unlikely that "2241 + 19873 = 22114" specifically is in the dataset, but very likely that there are many expressions equivalent to that expression in the dataset, and we've just picked one of those.

Imagine someone watching every lottery draw and after each draw going "Wow! the chances of those exact numbers coming up in that order are atronomical!"

> there are many expressions equivalent to that expression in the dataset

What do you mean by this?

I meant that I expect there are many examples of "a + b = c" in the training corpus, so GPT3 will answer some of them correctly.