|
|
|
|
|
by sigmoid10
1097 days ago
|
|
Also quasi-linear activation functions (prevent vanishing gradients), tons of regularisation (e.g convolutions) and more adaptive gradient descent (faster convergence). I've still met people in the early 2010s who tried to make neural networks work using only a few dozen units. Academia is pretty slow. What people also forget is that libraries like pytorch or tensorflow simply didn't exist. I wrote my own neural network stacks complete with backpropagation from scratch in c++ back then. |
|
LeCun, Bottou, et al (2002) in "Efficient Backprop" described techniques for improving backprop algorithms.