| 1. people overfit the baby datasets to zero training loss (MNIST) all the time. maybe you meant a "hard" dataset. 2. You clearly have no idea what you are talking about.
This paper is trying to argue a bit about why neural networks generalize well by showing with math that a nn with some of their conditions converges to the zero training loss. It isn't remotely meant to be practical. IT IS A THEORETICAL PAPER. And comparing it to nearest neighbors of 1 is so so so so so silly it isn't even wrong. edit. #1 is actually an entire research direction in the theory of machine learning fyi. It is possible to get neural networks that massively overfit but still generalize (which Is weird). https://arxiv.org/pdf/1611.03530.pdf That paper was really famous. It showed you can get zero training loss on data when you replace the labels with random noise. edit 2: I am sorry to be harsh. It is just hard to read such arrant nonsense. |
"The current paper focuses on the train loss, but does not address the test loss. It would be an important problem to show that gradient descent can also find solutions of low test loss. In particular, existing work only demonstrate that gradient descent works under the same situations as kernel methods and random feature methods [Daniely, 2017, Li and Liang, 2018]."