|
|
|
|
|
by nl
3761 days ago
|
|
It's easy to say that, in the same way that people now wouldn't be surprised if we could factor large numbers in linear time if we had a functional quantum computer!! 10 years ago no one believed it was possible to train deep nets[1]. It wasn't until the current "revolution" that people learned how important parameter initialization was. Sure, it's not a new algorithm, but it made the problem tractable. So far as algorithmic innovations go, there's always ReLU (2011) and leaky ReLU (2014). The one-weird-trick paper was pretty important too. [1] Training deep multi-layered neural networks is known to be hard. The standard learning strategy—
consisting of randomly initializing the weights of the network and applying gradient descent using
backpropagation—is known empirically to find poor solutions for networks with 3 or more hidden
layers. As this is a negative result, it has not been much reported in the machine learning literature.
For that reason, artificial neural networks have been limited to one or two hidden layers http://deeplearning.cs.cmu.edu/pdfs/1111/jmlr10_larochelle.p... |
|