| "Classical" CV and deep-learning CV needn't be opposing one another. There are several cases in which the classical approach is emulated by deep networks - implementing the same carefully thought-out pipelines but in a way that leverages representations learned from huge datasets (which are undeniably very powerful). Some examples are: * Bags of convolutional features for scalable instance search https://arxiv.org/pdf/1604.04653.pdf This paper treats each 'pixel' of a CNN activation tensor as a local descriptor, clusters them, and describes an image as a bag-of-visual-words histogram. * Learned Invariant Feature Transform https://arxiv.org/abs/1603.09114v2 This paper very explicitly emulates the entire SIFT pipeline for computing correspondences across pairs of images * Inverse compositional spatial transformer networks
https://arxiv.org/abs/1612.03897v1 This paper emulates Lucas-Kanade approach to computing the transform between 2 images with differentiable (trainable) components. Also, don't forget that deformable part models are convolutional networks! https://arxiv.org/abs/1409.5403 |
Conditional Random Fields as Recurrent Neural Networks https://arxiv.org/abs/1409.5403
I hope more fruit comes out of the fusion deep learning and graphical models.