Hacker News new | ask | show | jobs
by wookietrader 4962 days ago
No.

All currently used deep learning algorithms are special cases of neural networks. The reason why this is called "deep" learning is that before 2006, no one knew how to efficiently train neural nets with more than 1 or 2 hidden layers. (Or could, because of computing power.) Thanks to a breakthrough by Dr Hinton, this is now the case.

But all models used are neural nets. It's just that a vast amount new algorithms for training them have been developed in the last years and people came up with new ideas on how to use them.

But it is all neural nets. And that's the whole beauty of it.

5 comments

Closer, but still no :) Geoff Hinton proposed contrastive divergence training for Restricted Boltzmann Machines in his 2006 science paper. CD does not apply outside of RBMs though, and most of these nets in the article here are not in fact RBMs. The paper did spark a lot of interest in the field though.

These are all neural nets (with some bells and whistles in some cases like tied weights, pooling units, etc) trained exactly as they were trained before using stochastic gradient descent or LBFGS. We did come up with a lot of tricks for making SGD work though, like momentum terms, clamping of weights during learning, dropout, unsupervised pretraining, etc., but in large part it's just a lot more compute power. These networks just turned out to work very well when you have a LOT of (fairly homogeneous) data and can afford to scale them up computationally. And that's pretty awesome, looks like we have a powerful hammer and there are plenty of nails lying around :)

That is not entirely accurate. The Science paper described how to (pre)train a deep belief net by training a sequence of RBMs. Contrastive divergence for RBM training (and more generally products of experts) was described in 2002 in "Training Products of Experts by Minimizing Contrastive Divergence" http://www.cs.toronto.edu/~hinton/absps/nccd.pdf
doh, not very carefully worded now that I'm re-reading my answer, you're right of course. Well, at least we're slowly converging on the right answer over several comments :)
What exactly is wrong what I wrote? I did not say that all nets nowadays would be trained by RBMs (in the contrary, I said quite the opposite, that new algorithms had been developed). I just said that they were part of the breakthrough.
What are your thoughts re: LBFGS vs HF as applied to FF networks? I've been using HF for RNNs and have been having very good results, but I haven't yet tried it on FF networks and wonder if I'd see a benefit compared to SGD with the bells and whistles or even something like LBFGS.
Are you talking about Hinton's "A Fast Learning Algorithm for Deep Belief Nets"? Before that was published, Hinton's lab and their spiritual allies were training large restricted boltzmann machines via truncated sampling for decades. And Yann LeCun's convolutional networks (the architecture used in Google's vision project) have also been trained via plain old stochastic gradient descent for decades.

As far as I can tell there hasn't been any single revolutionary breakthrough in this field...we just keep getting more computing power, discovering better tricks and heuristics, and trying to build larger and larger networks.

I'm guessing the "pretraining" described in this 2006 Science article: http://www.cs.toronto.edu/~hinton/science.pdf. (Possibly the same line of research the article you mention). Sure, if you look at things from a wide enough perspective, there haven't been any "revolutionary" breakthroughs. But this did seem to reignite interest in neural nets after they had sort of languished for a while. (Science described this work, somewhat hyperbolically, as "Neural nets 2.0").
I think culturally, Hinton made a big splash and got people to pay attention to learning hierarchies and SGD-like training algorithms. Algorithmically, though, SGD is both ancient and still the dominant deep learning training technique (though useful tricks, extensions, and rules of thumb keep accumulating)
Thats a very wide classification. I could say everything is a machine algorithm since they all run on general purpose cpus.
On the other hand, there is nothing a neural net can do that a Turing machine can't do (perhaps even better?).
There is no "a neural net". Which model do you mean? Certainly not the deep belief networks under discussion in the article.

edit: I'm not sure I was clear enough-- the term "neural network" is a misnomer that encompasses extremely different models that are largely unrelated except for being vaguely inspired by the brain. A vanilla multilayer-perceptron is essentially a generalization of logistic regression. Restricted Boltzmann Machines are different beasts-- they're a restriction of undirected graphical models made amenable to efficient training. Recurrent neural networks aren't in any way a minor extension of other neural networks-- you need different terminology to talk meaningfully about them and they essentially don't have reliable training algorithms. This latter class can be viewed as Turing-equivalent computation, but they're not at all the same as the models in the original article.

Neural Nets cannot loop (unless they are recurrent neural nets) and are memory bound.
That's also what mojuba was referring to.
What was the breakthrough?
The breakthrough was the insight that while you cannot train a deep neural net at once with backprop, you can train one layer after the other greedily with an unsupervised objective and later fine tune it with standard backprop.

Years later, Swiss researchers (Dan Ciresan et al) found that you can train neural nets with backprop, but you need lots of training time and lots of data. You can only achieve this by making use GPUs, otherwise it would take months.

You can't train fully connected deep models with backprop, or at least not easily or well. An alternative solution to this problem is spatial weight pooling (Yann's convolutional networks) which play well with SGD.
That is correct. The problem is that the gradients get smaller and smaller as you back propagate back towards the input layer. So learning on the front part of the net is slow. Hinton has a lot of good material about htis in his Coursera lectures.
Yes you can.

Check out the publications by Ciresan on MNIST, have a look at Hinton's dropout paper or at the Kaggle competition that used deep nets. Or try it yourself and spend a descent amount of time on hyper parameter tuning. :)

Which of Ciresan's projects are you referring to? Everything I've seen by him uses convolutional layers of some sort.
The first time I saw a paper on feasible deep networks was at NIPS 2006, specifically this paper: http://www.cs.toronto.edu/~hinton/absps/fastnc.pdf

It's been awhile since I read the paper, but as I recall it involved training an unsupervised model layer-by-layer (training a layer, freezing the weights, then training another layer on top of it).

http://www.socher.org/index.php/DeepLearningTutorial/DeepLea... is also a good reference. I wrote a short blog post this morning on the same subject http://blog.markwatson.com/2012/11/deep-learning.html
Contrastive Divergence.

The deep learning / RBM tutorial here is quite good and explains the technique.

http://deeplearning.net/tutorial/rbm.html#rbm