Hacker News new | ask | show | jobs
by discardorama 3973 days ago
To expand on this some more: for a long time, thanks to Cybenko's theorem[1], people just used 1 hidden layer in their neural networks (also because computing was sloowww..). So, your typical NN architecture was input_layer --> hidden_layer --> output_layer.

Eventually, people realized that you could improve performance by adding more hidden layers. So while theoretically Cybenko was correct, practically stacking a bunch of hidden layers made more sense. These network architectures with stacks of hidden layers were then labelled as "deep" neural networks.

[1] https://en.wikipedia.org/wiki/Universal_approximation_theore...