|
|
|
|
|
by andbberger
2948 days ago
|
|
Meh, weird article. None of the nice modern results (imagenet family, etc) were achieved just through gradient descent - this article, like most, seems to be missing the forest for the trees with deep learning. It's not about the network architecture, or gradient descent on their own - it's the interaction, the dynamical system over weight space that training is. Behind every great modern deep learning result? An enormous hyperparameter search and lots of elbow grease to carefully tune that dynamical system juuuuust right so the weight particle ends up in just the right place when training finishes. Smells like evolution to me. Deepmind even formalized the evolutionary process a deep learning researcher runs manually when fine-tuning a model into population based training https://deepmind.com/blog/population-based-training-neural-n... |
|