Hacker News new | ask | show | jobs
by famouswaffles 1160 days ago
Not too shocking for me after this paper. https://arxiv.org/abs/2211.09066

You can teach GPT-3 arithmetic - https://imgur.com/a/w3DAYOi

Basically 100% accuracy up to about 13 digit addition and >90 after that.

What else can you teach GPT without changing weights ?

3 comments

> and >90 after that

This is such a circular thing, that I feel like it is amazing to see it.

The reason LLMs use a NN is because they're trying to encode a probability function for generating the passage.

And now, you are encoding another n-gram follower exercise (i.e 1+1 = 2) on top of it :)

Yeah... and I'm kind of suspicious of the whole "without changing the weights" deal, because adding working context to the model, like telling it the algorithm for adding numbers really sounds like there's some model state that's getting mutated, even if it's not stored in a file called weights.dat or whatev.
I meant shocking in the sense that it makes me gape in awe, but as I wrote, it's also, simultaneously, completely unsurprising given all the new emergent capabilities we keep discovering. We're in agreement :-)
Oh. yes well that's fair
> 100% accuracy up to about 13 digit addition

The graphs you just posted do not support that, they'd support at most 100% accuracy up to 4 digits.

it's GPT so 13=4
It's 100 at 13 and extremely close to it prior to that. Maybe basically 100 is better.