Hacker News new | ask | show | jobs
by moffkalast 1057 days ago
Isn't dropout just there to avoid overfitting? This is more like a mixture of experts type architecture.
1 comments

That is one lens to view it through. Co-adaptation reduction is another, and it is an intuitive one: generalization ability is improved if a neuron has to support multiple contexts instead of relying on other neurons to lift the weight, if you pardon the pun.

Improving neural networks by preventing co-adaptation of feature detectors https://arxiv.org/abs/1207.0580