Hacker News new | ask | show | jobs
by walrus 3725 days ago
I'm not qualified to answer this, but I will anyway.

To "operate" neural networks (as opposed to writing a framework for them), you need to know the building blocks. There are basic blocks like fully connected layers, convolutions, and nonlinear activations. Beyond those, there are higher level building blocks like LSTMs[1], gated recurrent units[2], highway layers[3], batch normalization[4], and residual blocks[5] that are made up of simpler blocks. Learning what these do and when it's appropriate to use them requires following current literature.

Operating neural networks requires some systems engineering skill. It takes a long time to train a single network and you'll find yourself trying many different architectures and hyperparameters along the way. Because of this, you'll want to distribute the training across many different systems and be able to easily monitor and deploy jobs on those systems.

A solid grasp of mathematics is useful to effectively debug your networks. You'll frequently find your network doesn't converge or gives totally garbage results, so you need to know how to dig into the network internals and understand how everything works. This is especially true if you're implementing a new building block from a paper.

Finally, know your machine learning and statistics fundamentals. Understand overfitting, model capacity, cross validation, probability, model ensembles, information theory, and so on. Know when a simpler model is more appropriate.

[1] ftp://ftp.idsia.ch/pub/juergen/fki-207-95.ps.gz

[2] http://arxiv.org/abs/1409.1259

[3] http://arxiv.org/abs/1505.00387

[4] http://arxiv.org/abs/1502.03167

[5] http://arxiv.org/abs/1512.03385

1 comments

So you don't think some of these details will not be automated away in the near future so that it doesn't require a specialist to do operate a neural network?
Already, it's not nearly as hard as this demo makes it look. There's one recent advance in particular that isn't in this demo, and that is Batch Normalization.

If you've played around with it a bit, I'm sure you have seen that deeper layers are hard to train... You see the dashed lines representing signal in the network become weaker and weaker as the network gets deeper. BatchNorm works wonders with this. It takes statistics from the minibatch of training examples, and tries to normalize it so that the next layer gets input more similar to what it expects, even if the previous layer has changed. In practice you get a much better signal, so the network can learn a lot more efficiently.

Without BatchNorm, more than two hidden layers is tedious and error-prone to train. With it, you can train 10-12 layers easily. (With another recent advance, residual nets, you can train hundreds!)

Such advances pushes the limit for what you can train easily, and what still requires GSD ("graduate student descent", figuring out just the right parameters to get something to work through intuition, trial and error). You still have to watch out for overfitting, but the nice thing about that is that more training data helps.

I think most of those things will remain important:

+ Designing the network architecture is a means to instill your knowledge of the problem into the network. For example, using convolutions over images encodes some translational invariance into the network. It makes up for lack of data. I don't think data augmentation alone is enough, either: if you use a "stupid" architecture with heaps of data, the computation will become too expensive or slow.

- The systems engineering part will probably get automated. I bet there are Amazon engineers crying at their desks while working on AWS Elastic Tensorshift right now. So unless you're specifically interested in that side of things, maybe this isn't the best area to focus on.

+ There are always going to be problems, so knowing how to debug is a useful skill.

+ ML/stats fundamentals aren't going away. You need to know what you're trying to do before you can do it.