| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by radq 1137 days ago

We were missing two architecture patterns that were needed to get deeper nets to converge: residual nets [1] which solved gradient propagation, and batch normalization [2] which solved initialization.

[1] Residual nets (2015): https://arxiv.org/abs/1512.03385

[2] Batch normalization (2015): https://arxiv.org/abs/1502.03167

3 comments

sigmoid10 1136 days ago

Also quasi-linear activation functions (prevent vanishing gradients), tons of regularisation (e.g convolutions) and more adaptive gradient descent (faster convergence). I've still met people in the early 2010s who tried to make neural networks work using only a few dozen units. Academia is pretty slow. What people also forget is that libraries like pytorch or tensorflow simply didn't exist. I wrote my own neural network stacks complete with backpropagation from scratch in c++ back then.

link

bravura 1136 days ago

LeCun et al (1989) had backprop working for digit recognition.

LeCun, Bottou, et al (2002) in "Efficient Backprop" described techniques for improving backprop algorithms.

link

sigmoid10 1136 days ago

Rosenblatt had a working perceptron for classifying images in the 1950s (!). And yet it took 60 years before the theory and compute power had developed enough for all of this to be interesting outside of small, purely academic experiments.

link

bravura 1135 days ago

Handwriting recognition on checks (LeCun et al 1989) wasn't really a small, purely academic experiment

link

sigmoid10 1134 days ago

And yet classical OCR techniques continued to dominate. Nothing happened in the industry on that front for over 20 years. That's as academic as it gets.

link

hzay 1137 days ago

Yes, but the tweet is talking about single layer networks!

link

arketyp 1136 days ago

AlexNet predated that though.

link