| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nl 3887 days ago

This is more wrong than right.

The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection.

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable.

This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN as described using Caffe/whatever, train as described in the paper and you'll end up with something that is pretty much on par with human performance for categorizing ImageNet images.

Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes.

This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job).

Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability.

For CNNs, this is pretty much entirely false.

1 comments

netheril96 3887 days ago

A GoogleNet or VGGNet has tons of parameters. How many convolutional layers are stacked together, the size and stride of each one, where to put the dropout layers, where to put the full connection layers, how they are connected together, global learning rate and momentum and decay, local learning rate and momentum and decay, each of these myriad parameters have an unpredictable effect on the final result. The initialization of the network also has a major bearing on the final outcome. It is almost a chaotic system where nothing small can be safely ignored. One time my result of training a CNN was swung by the `batch_size` parameter and to this day I don't know how.

Those parameters are exactly the type of handcrafted optimizations I am talking about. You cannot just fill in arbitrary numbers and expect the network to fare well. In fact, you cannot even expect it to converge.

You can take those papers and build a world class classifier only because someone else has taken all the time to optimize for the specific case. Once you switch the task, the result will be OK, but nowhere close to what a human or a true AI would give you. Not until you take the time to optimize the parameters.

link

nl 3887 days ago

A GoogleNet or VGGNet has tons of parameters.

Kinda, but they are defined for you. For example the GoogLeNet design is described in[1]. Page 5 lists the parameters, the diagram on page 6 shows how the layers are linked.

Yes, I agree that the design of a new neural network architecture is a skilled process, and there is a lot of hard work there. I couldn't agree with that more, but that isn't what we are talking about here.

It is quite possible to take a CNN like GoogLeNet designed for a specific purpose and reuse it in similar situations. GoogLeNet will always do pretty well for image classification.

I think of it as analogous to a piece of software like a database. Designing a new database system is hard, but taking something like SQLite and using it is easy. Yes, you can tune it and get better performance out of it, and yes, it will break if you use it in the wrong circumstances, but it is generally pretty reliable if used as designed.

Now this analogy breaks down because industrial use of CNNs is pretty new compared to Database systems. It's more like trying to get msql running on your Slackware 0.9 system in 1993 it is getting Postgres on Ubuntu 15.10.

Nevertheless, there isn't really a black art to using an existing CNN. Lots of schlepping to get CUDA running on your machine, though.

[1] http://www.cv-foundation.org/openaccess/content_cvpr_2015/pa...

[2] Not MySQL, msql: https://en.wikipedia.org/wiki/MSQL

link

stefs 3885 days ago

with the training/test data sets, wouldn't it be possible to find the best parameters with a genetic algorithm? i mean, sure, it'd take really long ... well, probably too long.

link