| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by carlmcqueen 2654 days ago

Not to spoil the article for anyone but..

Pretty in depth article and well laid out explanation of natural gradient descent with a small pre-fit dataset for a conclusion of 'too computationally expensive for machine learning/big data world'.

This is what I struggled with in school. You'd spend a class week learning some tough stuff only to be told 'this is no longer done, better methods are now used.'

Sometimes the work is needed to allow you to understand why/how the new method is used, but in many cases I didn't find that to be true.

4 comments

imurray 2654 days ago

I wouldn't recommend this to someone wanting to use the currently most practical tools. But I'm glad some people want to know about this stuff.

Sometimes the natural gradient turns out to be easier to compute than the gradient, and then it has made it in to applications.

Variational Inference: A Review for Statisticians (Blei et al., 2016) https://arxiv.org/abs/1601.00670

In this case, the gradient (51) has the Fisher Information in it and the natural gradient gets rid of that hard-to-compute term. This stuff has been used in a bunch of applications (section 5.1, p23).

You could argue that someone pragmatic might have tossed that positive definite term anyway (without knowing about natural gradients). Or come up with the same algorithm, as a heuristic improvement of existing algorithms. But the connection is quite neat.

Another example of links to natural gradients helping to make an algorithm cheaper: https://arxiv.org/abs/1712.01038 (Two related papers appeared at ICML 2018.)

link

ska 2653 days ago

   but in many cases I didn't find that to be true.

I wouldn't be so sure. Consider the "opposite" approach in some sense, a person who has only done a rapid training online course in current techniques. I give them a very specific task that matches their training well, they will probably do ok. If something unexpected happens though, they will mostly be unable to address it effectively. If a slightly different problem comes up, they won't know how to address that, either. They will have a shallow sense of "how", and very little sense of "why". Worse, they are ill prepared for adapting new techniques; i'd probably be just as well off finding a newer version to hire, who has had more recent training....

Sure, not all courses are as good as they could be (nor all lecturers) but the core of what they are trying to teach you is nothing as simple as methods. Methods are the easy part, after all.

link

serioussecurity 2653 days ago

Strong disagree. I started learning machine learning back in 2010, before neural networks really took off. The understanding of where the field was before and why DNNs were useful advancement puts me way ahead of peers who don't understand what problems they're solving. I have an appreciation for the ways that those problems can still creep back into neural network research, despite them apparently being a black box to many researchers.

link

wenc 2653 days ago

> 'too computationally expensive for machine learning/big data world'

But not for normal-sized datasets...

I've noticed a shift in the past year where (new?) people are thinking ML is synonymous wth NN/Deep Learning models.

To me ML has always encompassed statistical learning techniques, most of which work very well on normal-sized datasets (thousand to a million rows, definitions differ). Most classical/statistical methods work just fine at this scale, including the method in the linked article.

Lately I've also been thinking: given certain patterns of regularity in data, and outside of certain domains involving images/sound/language, we don't really need large-scale datasets to train our models. Good models can be trained on carefully selected samples without significant loss of fidelity, which opens up the scope of the types of models that can be deployed.

link