Hacker News new | ask | show | jobs
by version_five 1044 days ago
Anything would be better than "grokking".

From what I gather they're talking about double descent which afaik is the consequence of overparameterization leading to a smooth interpolation between the training data as opposed to what happens in traditional overfitting. Imagine a polynomial fit with the same degree as the number of data points (swinging up and down wildly away from the data) compared with a much higher degree fit that could smoothly interpolate between the points while still landing right on them.

None of this is what I would call generalization, it's good interpolation, which is what deep learning does in a very high dimensional space. It's notoriously awful at extrapolating, ie generalizing to anything without support in the training data.

2 comments

double descent is a different phenomenon from grokking
Nope, they are the same, just that grokking is when the KL between the representable information of the implicit biases and the data is extremely high (i.e. the network is poorly-designed or oriented for the task).

It's an informal term that not everyone accepts. Double-descent is acceptable as it describes a general phenomenon that is a natural consequence of a phase transition during neural network training. Grokking is like, to me, the 'fetch' of neural network terms. It's not new, it adds a seeming layer of separation from double-descent (which is is -- just very delayed), and it's not really accepted by everyone.

I personally do not like it at all. Especially because language affects _our_ implicit biases about what neural networks can and cannot do. We've already seen that their capacities and performance can be pushed way beyond what we traditionally expect of them.

But to summarize, they are the same. And this is why we need good terminology, as well, because poor adoption and boosting of improper terminology induces excess regret in the information exchange surface between agents in a game-theoretic sense in this lovely landscape of the ML world.

> It's notoriously awful at extrapolating, ie generalizing to anything without support in the training data.

Scientists are also pretty lousy at making new discoveries without labs. They just need training data.