Hacker News new | ask | show | jobs
by netheril96 3889 days ago
A GoogleNet or VGGNet has tons of parameters. How many convolutional layers are stacked together, the size and stride of each one, where to put the dropout layers, where to put the full connection layers, how they are connected together, global learning rate and momentum and decay, local learning rate and momentum and decay, each of these myriad parameters have an unpredictable effect on the final result. The initialization of the network also has a major bearing on the final outcome. It is almost a chaotic system where nothing small can be safely ignored. One time my result of training a CNN was swung by the `batch_size` parameter and to this day I don't know how.

Those parameters are exactly the type of handcrafted optimizations I am talking about. You cannot just fill in arbitrary numbers and expect the network to fare well. In fact, you cannot even expect it to converge.

You can take those papers and build a world class classifier only because someone else has taken all the time to optimize for the specific case. Once you switch the task, the result will be OK, but nowhere close to what a human or a true AI would give you. Not until you take the time to optimize the parameters.

2 comments

A GoogleNet or VGGNet has tons of parameters.

Kinda, but they are defined for you. For example the GoogLeNet design is described in[1]. Page 5 lists the parameters, the diagram on page 6 shows how the layers are linked.

Yes, I agree that the design of a new neural network architecture is a skilled process, and there is a lot of hard work there. I couldn't agree with that more, but that isn't what we are talking about here.

It is quite possible to take a CNN like GoogLeNet designed for a specific purpose and reuse it in similar situations. GoogLeNet will always do pretty well for image classification.

I think of it as analogous to a piece of software like a database. Designing a new database system is hard, but taking something like SQLite and using it is easy. Yes, you can tune it and get better performance out of it, and yes, it will break if you use it in the wrong circumstances, but it is generally pretty reliable if used as designed.

Now this analogy breaks down because industrial use of CNNs is pretty new compared to Database systems. It's more like trying to get msql running on your Slackware 0.9 system in 1993 it is getting Postgres on Ubuntu 15.10.

Nevertheless, there isn't really a black art to using an existing CNN. Lots of schlepping to get CUDA running on your machine, though.

[1] http://www.cv-foundation.org/openaccess/content_cvpr_2015/pa...

[2] Not MySQL, msql: https://en.wikipedia.org/wiki/MSQL

with the training/test data sets, wouldn't it be possible to find the best parameters with a genetic algorithm? i mean, sure, it'd take really long ... well, probably too long.