|
|
|
|
|
by RC_ITR
1229 days ago
|
|
>Not so. Actually, (for example) the phenomenon of "grokking" is when with enough training a NN eventually experiences a phase-change from memorising data to learning the general rules underlying it. Reading the paper, what they're seeming to get at is "when the dataset is algorithmic (like multiplication tables), the parameters get set in a way that appears to replicate the algorithm." That's cool, but not what GPT is. >I feel that people seem to have forgotten that deep learning is so powerful because it performs feature/representation learning, not because it can memorise, although that's powerful too. IMO that is the proper definition of 'deep learning'. That's not what GPT is going. |
|
> That's not what GPT is going.
I don't follow. Of course GPT models are learning representations (but I doubt you meant to deny this), that's how they can do semantic matching of its knowledge base (memorised information) in order to generalise from it. They don't only spit out training data verbatim.
Anyway, I didn't claim any GPT variant has actually "learn[t] math", but that it's not impossible with unlimited training.
[3] Liu &al. Omnigrok: Grokking Beyond Algorithmic Data https://openreview.net/forum?id=zDiHoIWa0q1
[4] Davies &al. Unifying Grokking and Double Descent https://openreview.net/pdf?id=JqtHMZtqWm