So ist works just like i thought it would. Why are CNN so hyped? Wasnt all this already known decades ago? Or is it just because we can afford the computing power?
The basic CNN structure was in place, but as the saying goes, "The Devil's in the details." Early CNN's were applied to problems such as handwritten character recognition with rows of small grayscale image cells as inputs, and were much shallower, smaller models. Today's CNN's operate on full resolution, multi-channel images and video, and can be orders of magnitudes deeper and larger. For instance, ResNets have been proven to demonstrate monotonic performance improvements out to 1200 layers on benchmark datasets. This would have been unthinkable even a couple years ago. By way of comparison, even the state of the art VGG network architecture of a couple years ago originally had to be trained in stages to reach 16 and 19 layers for submission to ILSVRC 2014 (Xavier / MSRA initialization makes this unnecessary now). At the time, VGG and GoogleNet (22 layers) were considered to be extraordinarily deep CNN's.
The underlying math was figured out a long time ago, but it's only been in recent years that we've had the computing power to test these out on lots of complicated, real-world classification problems, and had some incredible success.
I argued back in 2000 (a year after I got my computer engineering degree) that AI wouldn't take off until computing moved from single threaded/single core to multithreaded/ multicore processing. The fact that we are only hearing about this stuff 15 years later makes me feel that that assertion was largely right.
The biggest problem I see in AI is that the algorithms are generally fairly straightforward, but people haven't had the computing power to explore the problem space. We are seeing drastic improvement in things like video cards (routinely 1000+ cores) and data processing locality (map reduce). But processors have stagnated.
If we really want AI in any reasonable timescale, we need large arrays of general-purpose cores with a sane communication protocol that doesn't fixate on things like caching, we need a hybrid between Go and Erlang to do concurrent functional programming in a readable way with automagic scaling over a network, and we need all this yesterday. The fancy schmancy AI algorithms will become apparent when processing power is no longer the primary limitation, and at that point we can optimize them.
Decades ago I played around with neural nets but was frustrated because I either had to preprocess and normalize my inputs to the point where I didn't need a network anymore or I had to train a large network with so much data that it was not practical.
Having a cookbook approach with a catchy name and orders of magnitude more processing power have revived neural nets and now they are finally doing something useful.
Now everyone is jumping on the bandwagon so the field is progressing very quickly. Just because it's hyped doesn't mean it's not worth giving it a second look (although I'm still on the sidelines myself.)