Hacker News new | ask | show | jobs
by vundervul 4348 days ago
Who is Arno Candel and why should we pay attention to his tips on training neural networks? Anyone who suggests grid search for metaparameter tuning is out of touch with the consensus among experts in deep learning. A lot of people are coming out of the woodwork and presenting themselves as experts in this exciting area because it has had so much success recently, but most of them seem to be beginners. Having lots of beginners learning is fine and healthy, but a lot of these people act as if they are experts.
1 comments

his linkedin profile looks pretty legit to me. http://www.linkedin.com/in/candel I wouldn't want to get into a ML dick measuring contest with him anyway. H20 looks awesome too.

I think you are misinterpreting what he is saying about grid search. The grid search is just to narrow the field of parameters initially, he doesn't say how he would proceed after that point.

Just curious, what do you consider the state of the art? A Bayesian optimization? Wouldn't a grid search to start be like a uniform prior?

The rest of his suggestions looked on point to me, did you see anything else you would differ with? (i ask sincerely for my own education).

The point of Bayesian optimization is that, once you've evaluated your first parameter setting (or round of settings, if you're running in parallel), you no longer have a uniform prior; you have a posterior: that evaluation gave you new information, and you should use that to be smart about where you evaluate next instead of just continuing with a fixed grid.

There's also an argument (http://jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) that random search is better than grid search, because when only a few of the parameters really matter (but you don't know which ones), grid search wastes effort on scanning the unimportant parameters with the important parameters held fixed, but each point in a random search evaluates a new setting of the important parameters.

All this said, certainly grid search is way better than not optimizing parameters at all. My guess is that was the spirit in which this suggestion was made, so I wouldn't take it as a reason to discount the guy.

Bayesian optimization > random search > grid search

Grid search is nothing like a uniform prior since you would never get a grid search-like set of test points in a sample from a uniform prior.

I didn't really want to write a list of criticism for what is presumably a smart and earnest gentleman and the similarly smart and earnest woman who summarized the tips from his talk, but here goes:

The H2O architecture looks like a great way to get a marginal benefit from lots of computers and is not something that actually solves the parallelization problem well at all.

Using reconstruction error of an autoencoder for anomaly detection is wrong and dangerous so it is a bad example to use in a talk.

Adadelta isn't necessary and great results can be obtained with much simpler techniques. It is a perfectly good thing to use, but it isn't a great tip in my mind. This isn't something I would put on a list of tips.

In general, the list of tips doesn't just doesn't seem very helpful.

I appreciate you taking the time to give more detailed criticism, I learned from it - thank you