| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dplavery92 2769 days ago

This is something of an academic factoid that has nothing to do with the practice of training and using neural networks, or with the merits of deep networks that I was describing above.

Shallow feed-forward networks are "universal function approximators" [0] when the number of hidden neurons is finite but unbounded. Of course, the width of that layer grows exponentially in the depth of the deep network that you might wish to approximate [1].

The statement that "[i]t's just that deep neural networks are a lot more practical to train" (emphasis mine) sounds somewhat reductive; it's not only that depth is a nice trick or hack for training speed, but that depth makes the success of deep networks in the past decade at all possible. We live in a world with bounded computing resources and bounded training data. You cannot subsume all deep networks into shallow networks, and shallow networks into SVMs in the real world. So I am pretty sure of how important that distinction is.

And what's more, depth extracts a hierarchy of interpret-able features at multiple scales[2], and a decision surface embedded within that feature space, rather than a brittle decision surface in an extremely high dimensional space with little semantic meaning. One of these approaches generalizes better than the other to unseen data.

[0] https://en.wikipedia.org/wiki/Universal_approximation_theore... [1] https://pdfs.semanticscholar.org/f594/f693903e1507c33670b896... [2] https://distill.pub/2017/feature-visualization/

2 comments

igorkraw 2768 days ago

An important addition to this is priors: deep networks allow to express the prior that hierarchical representation, i.e. composing into multiple layers of abstractions make sense (see e.g. conv nets).

link

yters 2769 days ago

If an SVM kernel can replicate a 2 layer NN, why couldn't there be a kernel for a X layer NN, and then autoderive the architecture just like SVMs can autoderive the correct number of neurons? Then there'd also be a more robust theoretical understanding of what's happening.

link

igorkraw 2768 days ago

See my other point, there might be, in fact there definitely is for any working NN, but as of now (2019, happy new year) we probably can't find it

link