| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by subtypefiddler 2052 days ago

Whilst I agree with the general sentiment, in this particular instance it has to do with the depth of network that could be trained efficiently thanks to hardware advances. LeNet was 7 layers deep, Dan's 9, VGG's 13, GoogleNet's 22, etc.

There is theory w.r.t to thick networks as well (e.g the link to Gaussian processes require infinite width).

Deep makes sense here.

1 comments

The_rationalist 2052 days ago

Well except that most neural network are not deep, they have a very low number of layers but each layer can be tremendously wide. This should have been called wide learning. But we could imagine some learning algorithm that exploit more depth than wideness. A more correct naming would take into account both dimensions: depth and wideness.

Note that this is hortogonal to sparsedness vs density

link

antognini 2051 days ago

The depth seems to matter more than the width, at least as long as the layers are sufficiently wide. In fact, in the limit that the layer becomes infinitely wide, you just end up with a Gaussian process. In practice a width of ~100--1000 is sufficient to get behavior that is pretty close to a Gaussian process, so in general doubling the width of a layer doesn't gain you all that much compared to using those parameters for an extra layer. The real representational power seems to come from increasing depth.

link

canjobear 2051 days ago

Around the time the phrase "deep learning" came into vogue, the advances were indeed in training deeper networks, not wider. Later on it turned out that shallow wide networks are sufficient for many problems. (Also, it turned out the pre-training tricks that people came up with for training deep networks weren't really necessary either.)

link

subtypefiddler 2052 days ago

It's also important to note that they work despite being wide, you can see that with the efficiency of pruning, and ideas such as the lottery ticket hypothesis that state that "successful" sub-networks within the wide network account for most of the performance.

In the theory literature, if you have a K-deep network, K=1 is the shallow case, K>1 is deep. Agreed naming could be better, but it's not like "deep work" or "deep thoughts" as the parent was stating.

link

laleopue 2051 days ago

The adjective "deep" came from deep belief networks, which are a variation on restricted boltzmann machines. RBNs have one visible and one hidden layers, DNBs have more hidden layers - hence "deep". So it's not exactly based on a distinction between "deep" and "shallow" models.

link

sdenton4 2051 days ago

I dunno, in the resnet age, many and perhaps most networks are 20+ layers. I feel like the shallowest networks I see these days are RNNs being used for fast on device ML, which trends not to be terribly wide due to the same hardware constraints.

link