| HN Mirror

So it's not common to use a layer-by-layer training approach for deep nets? I thought that was one of the main things that made a huge difference and enabled the "deep" revolution. Anyways, isn't vanishing gradients still a problem? If so, how do people use these frameworks for deep nets? Otherwise, how is the problem resolved? I thought vanishing gradients was an issue for anything with more than 2 or 3 layers.