Hacker News new | ask | show | jobs
by idunning 3978 days ago
I was really impressed that the author included this caveat:

> A word on procedure: In this section, we've smoothly moved from single hidden-layer shallow networks to many-layer convolutional networks. It's all seemed so easy! We make a change and, for the most part, we get an improvement. If you start experimenting, I can guarantee things won't always be so smooth. The reason is that I've presented a cleaned-up narrative, omitting many experiments - including many failed experiments. This cleaned-up narrative will hopefully help you get clear on the basic ideas. But it also runs the risk of conveying an incomplete impression. Getting a good, working network can involve a lot of trial and error, and occasional frustration. In practice, you should expect to engage in quite a bit of experimentation.

There is a lot of "magical thinking" amongst people not actively doing research in the area (and maybe a bit within that community too), and I think it at least partly stems from mainly seeing very successful nets, and never seeing the many failed ideas before those network structures and hyperparameters were hit upon - a sampling bias type thing, where you only read about the things that work.

1 comments

Yes, difficulty of finding right hyperparameters is often overlooked. And it is a very frustrating part of creating a model. And methods like grid search just don't work, because of number of parameters to tune and time to train a network.
Actually, random search works a lot better than grid search for hyperparameter optimization. Usually, only a small number of hyperparameters actually matter, the trick is figuring out which ones. Grid search wastes time on irrelevant dimensions.

That said, any sort of hyperparameter optimization is extremely computationally intensive so random search is far from a panacea.

So when you search randomly and reach up to a set of optimised parameters, how do you know if it can't be optimised any further, since you haven't looked up all possible sets like in a grid?
You generally don't know if you've reached a suitable maxima, which is why it is good to run a nondeterministic optimizer a few times (if computation power allows) and see if there are any reliable parameters form there.

There are also somewhat better-than-random strategies such as Bayesian optimization and particle swarm optimization that can help you to search more efficiently.

Grid search never exhausts the search space either, at least if the dimensions are continuous.