|
|
|
|
|
by pakl
2591 days ago
|
|
Deep convolutional networks, by design, are unable to integrate contextual and ambient information present in an image (or in preceding images) to inform how to interpret local features they use. So it's no surprise they struggle with unconstrained images. Images where ambient context varies. It's intriguing how much focus there is on adversarial examples. You don't need adversarial examples to make a deep network fail - in a sense that's overkill. Just point the poor deep network at a sequence of images from the real world -- images from a self driving car, security camera, or webcam. You'll see it make spontaneous errors. No matter how much training data you gave it. The field will advance when/if practitioners recognize that classifying pixel patterns in isolation isn't sufficient for robust visual perception, and adopt alternative neural network designs that can interpret what they perceive in light of (no pun intended) context and physical expectations. It worked for our prototype.[0] [0] https://arxiv.org/abs/1607.06854 |
|
We are essentially building a frog with better and better visual perception in the hope that it could become a taxi driver. It will become a totally amazing super-frog with super-vision, but it's still just a frog with frog-like visual perception and limits. Using pre-attentive feature recognition stage equivalent for complex object recognition can fake human like object recognition when we force it, but it's wrong approach. We get these catastrophic failures because we hit the limits.
Features seem to exist independently from one another in the early processing stages of human perception. They are not associated with a specific object either. Human perception is not gradually turning features into objects like we do in deep learning. Properly distinguishing feature integration from detection and how to do it is a open question.