Hacker News new | ask | show | jobs
by jimmy_dean 2609 days ago
Addressing your second question. Informally, dropping nodes fights overfitting by creating subsample architectures of which are essentially thinned out networks of the one you've designed. Having trained on these sub nets means you've effectively combined the learning of a few different models and in doing so have generalized beyond the capabilities of your original "single" architecture.