|
|
|
|
|
by carlmcqueen
2654 days ago
|
|
Not to spoil the article for anyone but.. Pretty in depth article and well laid out explanation of natural gradient descent with a small pre-fit dataset for a conclusion of 'too computationally expensive for machine learning/big data world'. This is what I struggled with in school. You'd spend a class week learning some tough stuff only to be told 'this is no longer done, better methods are now used.' Sometimes the work is needed to allow you to understand why/how the new method is used, but in many cases I didn't find that to be true. |
|
Sometimes the natural gradient turns out to be easier to compute than the gradient, and then it has made it in to applications.
Variational Inference: A Review for Statisticians (Blei et al., 2016) https://arxiv.org/abs/1601.00670
In this case, the gradient (51) has the Fisher Information in it and the natural gradient gets rid of that hard-to-compute term. This stuff has been used in a bunch of applications (section 5.1, p23).
You could argue that someone pragmatic might have tossed that positive definite term anyway (without knowing about natural gradients). Or come up with the same algorithm, as a heuristic improvement of existing algorithms. But the connection is quite neat.
Another example of links to natural gradients helping to make an algorithm cheaper: https://arxiv.org/abs/1712.01038 (Two related papers appeared at ICML 2018.)