| This is more wrong than right. The example is correct, but not for the reasons stated. Humans are very, very good at face recognition. However, CNNs are pretty close to human performance for face detection. Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. Without much domain specific tuning, an AI's insight is far from reliable. This just isn't the case. Take the GoogLeNet or VGGNet papers, build the CNN as described using Caffe/whatever, train as described in the paper and you'll end up with something that is pretty much on par with human performance for categorizing ImageNet images. Take that same CNN architecture, and retrain it for another domain and it will perform roughly as well there too, for the task of categorizing into ~1K-10K image classes. This isn't domain specific tuning. It's domain specific training, which is very different (although collecting the data is a big job). Only after you apply tons of handcrafted optimizations, which are mostly black art, will you get close to or surpass a human's capability. For CNNs, this is pretty much entirely false. |
Those parameters are exactly the type of handcrafted optimizations I am talking about. You cannot just fill in arbitrary numbers and expect the network to fare well. In fact, you cannot even expect it to converge.
You can take those papers and build a world class classifier only because someone else has taken all the time to optimize for the specific case. Once you switch the task, the result will be OK, but nowhere close to what a human or a true AI would give you. Not until you take the time to optimize the parameters.