Hacker News new | ask | show | jobs
by imurray 2655 days ago
I wouldn't recommend this to someone wanting to use the currently most practical tools. But I'm glad some people want to know about this stuff.

Sometimes the natural gradient turns out to be easier to compute than the gradient, and then it has made it in to applications.

Variational Inference: A Review for Statisticians (Blei et al., 2016) https://arxiv.org/abs/1601.00670

In this case, the gradient (51) has the Fisher Information in it and the natural gradient gets rid of that hard-to-compute term. This stuff has been used in a bunch of applications (section 5.1, p23).

You could argue that someone pragmatic might have tossed that positive definite term anyway (without knowing about natural gradients). Or come up with the same algorithm, as a heuristic improvement of existing algorithms. But the connection is quite neat.

Another example of links to natural gradients helping to make an algorithm cheaper: https://arxiv.org/abs/1712.01038 (Two related papers appeared at ICML 2018.)