Hacker News new | ask | show | jobs
by garyiskidding 822 days ago
The 2013 Zeiler and Fergus paper also explains this based on activations during training of the network resulting in feature detection across layers.

Paper : https://arxiv.org/pdf/1311.2901.pdf