| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RSchaeffer 3161 days ago
	Go watch Yatin Gal's talk on dropout in neural networks. He shows pretty convincingly that the belief that dropout reduces network overfitting by introducing noise is wrong.

2 comments

mannigfaltig 3161 days ago

Wait, that can’t be wrong because that is literally what DO does. It is a convex hull regularizer around the network activations using noise. That is also why dropout does not solve susceptibility to adversarial examples: It merely extends the regions that the NN generalizes to outward; but that is limited because high-dimensional spaces are counter-intuitively large and the noise required to cover a descent fraction of the “unmapped” space would completely prevent learning. AFAIK, Yarin Gal merely provides a Bayesian interpretation of the noise.

link

RSchaeffer 3161 days ago

IIRC, his "Bayesian interpretation of the noise" actually shows that dropout performs approximate integration over model parameters. As he says, dropout doesn't work because of the noise but despite the noise.

https://youtu.be/3ONLxYeM1Sc?t=19m21s

link

mannigfaltig 3161 days ago

That seems like a strange/unnecessary way to put it because DO is noise.

link

_0ffh 3161 days ago

I thought the point was that dropout effectively learns an ensemble of all... erm... subtopologies (if that's the right word) of your network?

link

mannigfaltig 3161 days ago

You can also call it just subgraph; not all of them, but exponentially many.

link