|
|
|
|
|
by RC_ITR
1227 days ago
|
|
It's important to remember the first principle of what GPT does. It looks at the pattern of a bunch of unique tokens in a dataset (in this case words online) and riffs on those patterns to make outputs. It will never learn math this way, no matter how much training you give it. BUT we have already solved computers doing math with regular rules based algorithms. The way to solve the math problem is to filter inputs and send some to the GPT NN and some to a regular algorithm (this is what google search does now for example). GPT is an amazing tool that can do a bunch of amazing stuff, but it will never do everything (the metaphor I always give is that your pre-frontal cortex is the most complex part of your brain, but it will never learn how to beat your heart). |
|
Not so. Actually, (for example) the phenomenon of "grokking" is when with enough training a NN eventually experiences a phase-change from memorising data to learning the general rules underlying it [1].
Grokking isn't actually desirable, it's better that the model go more directly and quickly to learning the general rule, which is achievable in toy problems (called "comprehension" in [2]).
I feel that people seem to have forgotten that deep learning is so powerful because it performs feature/representation learning, not because it can memorise, although that's powerful too. IMO that is the proper definition of 'deep learning'.
[1] Power &al. Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets https://arxiv.org/abs/2201.02177
[2] Liu &al. Towards Understanding Grokking: An Effective Theory of Representation Learning https://arxiv.org/abs/2205.10343