There used to be a time, long ago, when neural networks had 2 layers: the input layer and the output layer, with full connections between the two. There's an old proof that, if you don't have nonlinearities there's an equivalent 2-layer net for any arbitrarily deep neural net.
So the real difference between modern "deep neural networks" and "neural networks" is not actually the depth, even though it is. It's just that neural networks, old definition, a 2 layer net can match any depth net, so you wouldn't use more. The real difference isn't the depth, but the nonlinearity (the tanh/sigmoid/relu operation).
So the real difference between modern "deep neural networks" and "neural networks" is not actually the depth, even though it is. It's just that neural networks, old definition, a 2 layer net can match any depth net, so you wouldn't use more. The real difference isn't the depth, but the nonlinearity (the tanh/sigmoid/relu operation).