Hacker News new | ask | show | jobs
by vintermann 3724 days ago
Already, it's not nearly as hard as this demo makes it look. There's one recent advance in particular that isn't in this demo, and that is Batch Normalization.

If you've played around with it a bit, I'm sure you have seen that deeper layers are hard to train... You see the dashed lines representing signal in the network become weaker and weaker as the network gets deeper. BatchNorm works wonders with this. It takes statistics from the minibatch of training examples, and tries to normalize it so that the next layer gets input more similar to what it expects, even if the previous layer has changed. In practice you get a much better signal, so the network can learn a lot more efficiently.

Without BatchNorm, more than two hidden layers is tedious and error-prone to train. With it, you can train 10-12 layers easily. (With another recent advance, residual nets, you can train hundreds!)

Such advances pushes the limit for what you can train easily, and what still requires GSD ("graduate student descent", figuring out just the right parameters to get something to work through intuition, trial and error). You still have to watch out for overfitting, but the nice thing about that is that more training data helps.