Hacker News new | ask | show | jobs
by mark_l_watson 4962 days ago
That is correct. The problem is that the gradients get smaller and smaller as you back propagate back towards the input layer. So learning on the front part of the net is slow. Hinton has a lot of good material about htis in his Coursera lectures.