|
|
|
|
|
by discardorama
3973 days ago
|
|
To expand on this some more: for a long time, thanks to Cybenko's theorem[1], people just used 1 hidden layer in their neural networks (also because computing was sloowww..). So, your typical NN architecture was input_layer --> hidden_layer --> output_layer. Eventually, people realized that you could improve performance by adding more hidden layers. So while theoretically Cybenko was correct, practically stacking a bunch of hidden layers made more sense. These network architectures with stacks of hidden layers were then labelled as "deep" neural networks. [1] https://en.wikipedia.org/wiki/Universal_approximation_theore... |
|