|
|
|
|
|
by YeGoblynQueenne
1596 days ago
|
|
As far as I can tell from a quick heuristic perusal, the "Generalization Beyond Overfitting" paper reports "generalisation" _on the validation set_. That's not particularly impressive and it's not particularly "generalisation" either. Actually, I really don't grokk this (if I may). I often see deep learning work reporting generalisation on the validation set. What's up with that? Why is generalisation on the validation set more interesting than on the test set, let alone OOD data? |
|
This behavior goes against current paradigm of thinking about training NNs. It is just very unexpected, similarly as double descent is unexpected from classical statistics point of view that more parameters lead to more over-fitting.
They could have split validation test set into validation and test sets, but I don't know what that would achieve in their case.
Fig. 1 center shows different train / validate splits. Fig 2. shows a swoop between different optimization algorithms if you are concerned about hyperparameters over-fitting.
But to me really interesting is the Fig 3. that shows that NN learned the structure of the problem.