|
|
|
|
|
by discardorama
4016 days ago
|
|
If someone were to ask me what's wrong with DL (and not that anyone would, since I'm an unknown), I'd say the lack of theory. Most DL results look very hacky to me. Someone says Max Pooling works; someone again comes along and says it's not necessary. Someone says sigmoid or tanh are the best activation functions; someone else says ReLUs are better. And so on. Why? Why is one better than the other? I'm no biologist, but I don't think our brains are going around trying to do a grid search for the best hyperparameters. Most DL results today are the result of throwing 1000s of Titans on the problem and then sitting back for a week for the beast to cough up a solution. Tangential nitpick: one (very minor) nit I have with Prof LeCun's presentations is that I don't see him give more credit to Hinton and Schmidhuber. Hinton is mentioned a couple of times (3), but Schmidhuber is totally ignored; for example, when he mentions LSTM, it's cited as [Hochreiter 1997], even though it was a join publication with Schmidhuber. It should be cited as [Hochreiter et 1997], as he does in the very next line. |
|
I disagree re: the 1000s of titans thing. Google, Baidu, etc are building large GPU clusters and have basically shown "similar resources = similar results", but everyone else is mostly using single machine--maybe multi-GPU--and doing fine. A single Titan X is a BEAST for deep learning--nobody is using 1000s and you only need 1 for great results on most datasets I've seen.
On the subject of Schmidhuber, I saw him speak once and he spent half the talk explaining how he invented everything he's talking about (EVERYTHING!) and the other half talking about how no one gives him credit. I'm half joking, but I think there's more to his story. Or it's a miscarriage of justice.