Hacker News new | ask | show | jobs
by andbberger 2948 days ago
Meh, weird article. None of the nice modern results (imagenet family, etc) were achieved just through gradient descent - this article, like most, seems to be missing the forest for the trees with deep learning.

It's not about the network architecture, or gradient descent on their own - it's the interaction, the dynamical system over weight space that training is.

Behind every great modern deep learning result? An enormous hyperparameter search and lots of elbow grease to carefully tune that dynamical system juuuuust right so the weight particle ends up in just the right place when training finishes. Smells like evolution to me. Deepmind even formalized the evolutionary process a deep learning researcher runs manually when fine-tuning a model into population based training https://deepmind.com/blog/population-based-training-neural-n...

1 comments

"An enormous hyperparameter search and lots of elbow grease to carefully tune that dynamical system" i.e. the classic GDGS (gradient descent by grad student) approach where you have a grad student train a system, decide in which direction the parameters should be updated (i.e. look at the gradient), tweak the system and repeat until convergence.