Hacker News new | ask | show | jobs
by voqv 1396 days ago
Is that why it took long? I was under the impression it was because of diminishing gradients in backprop once you stack a huge amount of layers (the deep in deep neural networks).