Hacker News new | ask | show | jobs
by dougabug 3558 days ago
The basic CNN structure was in place, but as the saying goes, "The Devil's in the details." Early CNN's were applied to problems such as handwritten character recognition with rows of small grayscale image cells as inputs, and were much shallower, smaller models. Today's CNN's operate on full resolution, multi-channel images and video, and can be orders of magnitudes deeper and larger. For instance, ResNets have been proven to demonstrate monotonic performance improvements out to 1200 layers on benchmark datasets. This would have been unthinkable even a couple years ago. By way of comparison, even the state of the art VGG network architecture of a couple years ago originally had to be trained in stages to reach 16 and 19 layers for submission to ILSVRC 2014 (Xavier / MSRA initialization makes this unnecessary now). At the time, VGG and GoogleNet (22 layers) were considered to be extraordinarily deep CNN's.