|
|
|
|
|
by version_five
1044 days ago
|
|
Anything would be better than "grokking". From what I gather they're talking about double descent which afaik is the consequence of overparameterization leading to a smooth interpolation between the training data as opposed to what happens in traditional overfitting. Imagine a polynomial fit with the same degree as the number of data points (swinging up and down wildly away from the data) compared with a much higher degree fit that could smoothly interpolate between the points while still landing right on them. None of this is what I would call generalization, it's good interpolation, which is what deep learning does in a very high dimensional space. It's notoriously awful at extrapolating, ie generalizing to anything without support in the training data. |
|