Hacker News new | ask | show | jobs
by agibsonccc 4399 days ago
To address some of the comments being presented here, neural nets despite being harder to train can be debugged visually.

A few tips for those of you who use neural nets:

Debug the weights with histograms. Track the gradient and make sure the magnitude is not too large and its normally distributed.

Keep track of your gradient changes when using either gradient descent or conjugate gradient.

Plot your filters, visualize what each neuron is learning.

Watch the rate of change of your cost function. If it seems like its changing too fast and stops early lower your learning rate.

Plot your activations: if they start out grey you're fine. If you start all black, you need to retune some of your parameters.

Lastly, understand the algorithm you're using. Convolutional nets are different from recursive neural tensor are different denoising autoencoders are different from RBMS/DBNs.

Pay attention to your cost function, reconstruction entropy is used differently from negative log likelihood is used differently for different objectives.

If you are trying to do feature learning, you are using RBMs, Denoising AutoEncoders and you will use reconstruction entropy. This is what you use for feature detectors. You may end up using negative log likelihood if you are dealing with continuous data.

For RBMs, pay attention to the different kinds of units[1]. Hinton recommends Gaussian visible with recitifed linear for continuous data, binary binary otherwise.

For denoising autoencoders, watch your corruption level. A higher one helps generalize better, especially with less data.

For time series or sequential data, you can use a recurrent net,moving window with DBNs, or recursive neural tensor

Other knobs:

If your deep learning framework doesn't have adagrad find one that does.

Dropout: crucial. Dropout is used in combination with mini batch learning to handle learning different "poses" of images as well as generalizing feature learning. This can be used in combination with sampling with replacement to minimize sampling error.

Regularization: L2 is typically used. Hinton once said: you want a neural net that always overfits but is regularized (youtube video...don't remember link right now).

Would love to answer questions! Source: I work on/teach this stuff. Still working my way up there, but it seems to be going well so far.[2][3]

Lastly, tweak one knob at a time. Neural nets have a lot going on. You don't want a situation where you A/B tested 10 different parameters at once and you don't know which one worked or why.

[1]: http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf

[2]: http://deeplearning4j.org/

[3]: http://zipfianacademy.com/

[4]: http://arxiv.org/abs/1206.5533 http://deeplearning4j.org/ http://deeplearning4j.org/debug.html http://yosinski.com/media/papers/Yosinski2012VisuallyDebuggi...

2 comments

Nice to see you HN, Adam =)

We just opened up the roadmap for contributions (click the "view source" with a logged in account). Feel free to add any of these notes where you think they'd fit in nicely -- don't worry about messing anything up, we have version control for a reason. Also, please email me if you run into any problems/confusion.

Will do! Like we discussed before, great initiative!
What do you think about Google Convnet platform? I got it running, and played with supplied configurations, however it seems that it hasn't been updated in a while, for example, there's no dropout.

Then there's also Caffe CNN from Berkeley Vision group. Not sure what are the differences between the two.

Which one would you recommend as a learning tool, speed-wise, and as a possible starting point for customization?