|
|
|
|
|
by DavidSJ
1514 days ago
|
|
With over-parameterized neural networks, the problem essentially becomes convex and even linear [1], and in many contexts provably converges to a global minimum [2], [3]. The question then becomes: why does this generalize [4], given that the classical theory of Vapnik and others [5] becomes vacuous, no longer guaranteeing lack of over-fitting? This is less well understood, although there is recent theoretical work here too. [1] Lee et al (2019). Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent. https://proceedings.neurips.cc/paper/2019/hash/0d1a9651497a3... [2] Allen-Zhu et al (2019). A convergence theory for deep learning via over-parameterization. https://proceedings.mlr.press/v97/allen-zhu19a.html [3] Du et al (2019). Gradient Descent Finds Global Minima of Deep Neural Networks. http://proceedings.mlr.press/v97/du19c.html [4] Zhang et al (2016). Understanding deep learning requires rethinking generalization. [5] Vapnik (1999). The nature of statistical learning theory. https://arxiv.org/abs/1611.03530 |
|