|
|
|
|
|
by singulargalaxy
461 days ago
|
|
Hard disagree. Your link relies on gradient descent as an explanation, whereas OP explains why optimization is not needed to understand DL generalization. PAC-Bayes, and the other different countable hypothesis bounds in OP also are quite divergent from VC dimension. The whole point of OP seems to be that these other frameworks, unlike VC dimension, can explain generalization with an arbitrarily flexible hypothesis space. |
|
But once you are ready to do that then algorithmic stability is enough. You don't then need to think about Bayesian ensembles, or other proxies/simplifications etc. but can focus on just the specific learning setup you have. BTW algorithmic stability is not a new idea. An early version showed up within a few years of VC theory in the 80s in order to understand why nearest neighbors generalizes (it wasn't called algorithmic stability then though).
If you are interested in this, also recommend [3].
[2] https://arxiv.org/abs/1611.03530
[3] https://arxiv.org/abs/1902.04742