Hacker News new | ask | show | jobs
by RSchaeffer 3161 days ago
Go watch Yatin Gal's talk on dropout in neural networks. He shows pretty convincingly that the belief that dropout reduces network overfitting by introducing noise is wrong.
2 comments

Wait, that can’t be wrong because that is literally what DO does. It is a convex hull regularizer around the network activations using noise. That is also why dropout does not solve susceptibility to adversarial examples: It merely extends the regions that the NN generalizes to outward; but that is limited because high-dimensional spaces are counter-intuitively large and the noise required to cover a descent fraction of the “unmapped” space would completely prevent learning. AFAIK, Yarin Gal merely provides a Bayesian interpretation of the noise.
IIRC, his "Bayesian interpretation of the noise" actually shows that dropout performs approximate integration over model parameters. As he says, dropout doesn't work because of the noise but despite the noise.

https://youtu.be/3ONLxYeM1Sc?t=19m21s

That seems like a strange/unnecessary way to put it because DO is noise.
I thought the point was that dropout effectively learns an ensemble of all... erm... subtopologies (if that's the right word) of your network?
You can also call it just subgraph; not all of them, but exponentially many.