| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nightski 2768 days ago
	I thought it had been proven that a two layer neural network has the same power as a deep one (obviously with a much greater width). It's just that deep neural networks are a lot more practical to train in practice. So I'm not sure how important that distinction is.

3 comments

dplavery92 2768 days ago

This is something of an academic factoid that has nothing to do with the practice of training and using neural networks, or with the merits of deep networks that I was describing above.

Shallow feed-forward networks are "universal function approximators" [0] when the number of hidden neurons is finite but unbounded. Of course, the width of that layer grows exponentially in the depth of the deep network that you might wish to approximate [1].

The statement that "[i]t's just that deep neural networks are a lot more practical to train" (emphasis mine) sounds somewhat reductive; it's not only that depth is a nice trick or hack for training speed, but that depth makes the success of deep networks in the past decade at all possible. We live in a world with bounded computing resources and bounded training data. You cannot subsume all deep networks into shallow networks, and shallow networks into SVMs in the real world. So I am pretty sure of how important that distinction is.

And what's more, depth extracts a hierarchy of interpret-able features at multiple scales[2], and a decision surface embedded within that feature space, rather than a brittle decision surface in an extremely high dimensional space with little semantic meaning. One of these approaches generalizes better than the other to unseen data.

[0] https://en.wikipedia.org/wiki/Universal_approximation_theore... [1] https://pdfs.semanticscholar.org/f594/f693903e1507c33670b896... [2] https://distill.pub/2017/feature-visualization/

link

igorkraw 2767 days ago

An important addition to this is priors: deep networks allow to express the prior that hierarchical representation, i.e. composing into multiple layers of abstractions make sense (see e.g. conv nets).

link

yters 2768 days ago

If an SVM kernel can replicate a 2 layer NN, why couldn't there be a kernel for a X layer NN, and then autoderive the architecture just like SVMs can autoderive the correct number of neurons? Then there'd also be a more robust theoretical understanding of what's happening.

link

igorkraw 2767 days ago

See my other point, there might be, in fact there definitely is for any working NN, but as of now (2019, happy new year) we probably can't find it

link

slashcom 2768 days ago

An infinitely sized 2 layer NN is universal in the same way a Turing machine is universal — sure you can write any program; God help you if you try.

link

igorkraw 2767 days ago

If i remember my Goodfellow correctly (and quickly checking, wikipedia, I did https://en.wikipedia.org/wiki/Universal_approximation_theore... ), there is a nuance here which is almost always missed: you can represent any function with a sufficiently wide 2 layer neural network, it doesn't say anything about being able tune the network until you find a correct setting (i.e. learnability).

This is important. Flippantly said,discarding learnability and speed of convergence, you can get the power of any neural network by the following algorithm:

1. Randomly generates a sufficiently wide bit pattern 2. Interprets it as a program and run it on the test set 3. discard results until the desired accuracy is reached

link