|
|
|
|
|
by karpathy
4973 days ago
|
|
Yes, but just as with normal feature engineering, don't make the mistake of thinking that these methods are fully automatic work by magic. There is no free lunch. A common criticism with these methods is that they merely shift engineering from features to parameters that specify the architecture. There are many choices to be made: The exact number of layers, number of neurons per layer, the connectivity, sparsity parameters, non-linearities, sizes of receptive fields, learning rates, weight decays, pre-training schedule etc etc etc. Perhaps even worse, while you can use intuition to design features, it is not as trivial to see if you should be using a sigmoid, tanh, or rectified linear units (+associated parameters for each) in the 3rd layer of the network. And maybe even worse, these parameters can actually have quite a strong effect on the final performance. These are still powerful models and we are learning a lot about what works and what doesn't (and I'm optimistic) but don't make the mistake of thinking they are automatic. For now, you need to know what you're doing. |
|
And parameter selection isn't nearly as involved in practice--but it is important to know what you're doing because you'll basically be translating from papers to code in order to get a working implementation, at least until some startup comes out with a plug and play deep learning paas
For anyone interested in playing with the tools that are available, deeplearning.net provides a set of very good tutorials with working code.