|
|
|
|
|
by versteegen
1232 days ago
|
|
Grokking doesn't just happen for algorithmic data, it also happens less dramatically in other datasets [3]. Grokking seems to be closely related to double descent [4], which is quite widespread. Anyway I only wanted to give grokking as an example of how memorisation doesn't preclude generalisation, it may simply precede it. > That's not what GPT is going. I don't follow. Of course GPT models are learning representations (but I doubt you meant to deny this), that's how they can do semantic matching of its knowledge base (memorised information) in order to generalise from it. They don't only spit out training data verbatim. Anyway, I didn't claim any GPT variant has actually "learn[t] math", but that it's not impossible with unlimited training. [3] Liu &al. Omnigrok: Grokking Beyond Algorithmic Data https://openreview.net/forum?id=zDiHoIWa0q1 [4] Davies &al. Unifying Grokking and Double Descent https://openreview.net/pdf?id=JqtHMZtqWm |
|
> They verify this observation in a student teacher setup, and show that it can arise in non-algorithmic datasets if initialized in a certain weight regime for appropriate sample size.
It’s not a widespread phenomenon by any means and it is not observably happening inside GPT. No amount of training will change that, only a drastic specialization of the training data (which defeats the purpose).
> They don't only spit out training data verbatim.
I’m not saying verbatim. But I am saying it won’t return a pattern it hasn’t seen in its dataset before. The whole point of attention is that the token isn’t just the word, but the word as it exists in context. If you expand verbatim to include that as the token, yes that is exactly what GPT does (it will not connect two tokens unless it was trained on data that implies those tokens should be connected, it know nothing else about what those tokens are)
Again to put it simply, a 3rd grader can multiply any (and I mean literally the infinite set) two numbers. GPT cannot and never will be able to multiple an infinite set of numbers.