Technically yes, most often it's about stacking more layers in neural networks, making them "deep". However, there is some merit to the new hype since stacking more layers worked way better than anyone previously working with neural networks and ML thought it would. But in theory you could generalize deep learning to other methods than neural networks, it's basically about creating way more complex models than those used in previous research and feeding them lots of data. Thereby assuming less about the problem and letting the model figure it out.
> it's basically about creating way more complex models than those used in previous research and feeding them lots of data
Those are instructions for over-fitting. Deep learning neural networks escape from this problem somehow, but it's not a given that other models would escape it too.
This is true! Overfitting is definitely one of the biggest problems with deep learning. Some techniques to avoid it have been developed, such as dropout (introducing noise) and early stopping. But in general this is why deep learning requires huge amount of data, a deep learning model will overfit if not given enough data. That is also why (at this time) it only performs well for certain problems where the ratio between available data and problem complexity is high enough.
The traditional way to avoid overfitting is to reduce the number of independent variables, shrink coefficients towards zero, or otherwise limit the complexity of the model.
With deep neural networks the approach is different. Instead of trying to find global maximum (which is too hard, and will also cause the model to be grossly overfit), the algorithm stops much earlier. Such "underfit" models seem to generalize much better.
They mostly escape from that by using huge amount of data and massive computing resources. Deep learning was became feasible because of the huge amount of data companies like Facebook, Google, Apple and others has collected.
Deep networks have more layers than the previous generation. They seem to work better in engineering practice than mathematically equivalent short wide networks.