Hacker News new | ask | show | jobs
by mannigfaltig 3157 days ago
Wait, that can’t be wrong because that is literally what DO does. It is a convex hull regularizer around the network activations using noise. That is also why dropout does not solve susceptibility to adversarial examples: It merely extends the regions that the NN generalizes to outward; but that is limited because high-dimensional spaces are counter-intuitively large and the noise required to cover a descent fraction of the “unmapped” space would completely prevent learning. AFAIK, Yarin Gal merely provides a Bayesian interpretation of the noise.
1 comments

IIRC, his "Bayesian interpretation of the noise" actually shows that dropout performs approximate integration over model parameters. As he says, dropout doesn't work because of the noise but despite the noise.

https://youtu.be/3ONLxYeM1Sc?t=19m21s

That seems like a strange/unnecessary way to put it because DO is noise.