|
|
|
|
|
by erwald
1037 days ago
|
|
"Grok" in AI doesn't quite describe generalization, it's more specific that that. It's more like "delayed and fairly sudden generalization" or something like that. There was some discussion of this in the comments of this post[1], which proposes calling the phenomenon "eventual recovery from overfitting" instead. [1] https://www.lesswrong.com/posts/GpSzShaaf8po4rcmA/qapr-5-gro... |
|
Neural network training [edit: on a fixed point task, as is often the case {such as image->label}] is always (always) biphasic necessarily, so there is no "eventual recovery from overfitting". In my experience, it is just people newer to the field or just noodling around fundamentally misunderstanding what is happening, as their network goes through a very delayed phase change. Unfortunately there is a significant amplification to these kinds of posts and such, as people like chasing the new shiny of some fad-or-another-that-does-not-actually-exist instead of the much more 'boring' (which I find fascinating) math underneath it all.
To me, as someone who specializes in optimizing network training speeds, it just indicates poor engineering to the problem on the part of the person running the experiments. It is not a new or strange phenomenon, it is a literal consequence of the information theory underlying neural network training.