|
|
|
|
|
by Isinlor
1596 days ago
|
|
The point of the paper is to show that NN can still learn long after fully memorizing the train dataset. This behavior goes against current paradigm of thinking about training NNs. It is just very unexpected, similarly as double descent is unexpected from classical statistics point of view that more parameters lead to more over-fitting. They could have split validation test set into validation and test sets, but I don't know what that would achieve in their case. Fig. 1 center shows different train / validate splits. Fig 2. shows a swoop between different optimization algorithms if you are concerned about hyperparameters over-fitting. But to me really interesting is the Fig 3. that shows that NN learned the structure of the problem. |
|
That is the claim in the paper. I don't understand how it is supported by measuring results on the validation set.
Figure 3 looks nice but it doesn't say anything on its own. I don't know what's the best way to interpret it. The paper offers some interpretation that convinces you, but not me. Sorry, this kind of work is too fuzzy for me. What happened to good, old-fasion proofs?