Hacker News new | ask | show | jobs
by antognini 3575 days ago
In principle a shallow NN (1 hidden layer) can approximate any function. But it has a tendency to overfit and just "memorize" the inputs. The basic idea of adding additional layers, is that the early layers can learn very low-level features of the data, and later layers combine the low-level features into higher-level features. This tends to make the models generalize well.

A standard example is for a face detection algorithm. The first layer will do edge detection, the next layer will combine edges into corners and simple shapes, the next layer will maybe use those shapes to look for features like eyes, noses, mouths, etc., and then the next layer will maybe combine those features to look for a whole face.

I wrote a more detailed answer here:

http://stats.stackexchange.com/questions/222883/why-are-neur...