Hacker News new | ask | show | jobs
by nrmn 3717 days ago
No they all use the same general principal of backpropagation to do the training. Different flavours of optimizers exist with different tweaks and additions to speed training up.

Relevant file in project: https://github.com/tflearn/tflearn/blob/0.1.0/tflearn/optimi...

1 comments

So it's not common to use a layer-by-layer training approach for deep nets? I thought that was one of the main things that made a huge difference and enabled the "deep" revolution. Anyways, isn't vanishing gradients still a problem? If so, how do people use these frameworks for deep nets? Otherwise, how is the problem resolved? I thought vanishing gradients was an issue for anything with more than 2 or 3 layers.