Hacker News new | ask | show | jobs
by MAXPOOL 2595 days ago
Learning multilayer convolutional representations of statistical features is roughly equal to taking few first few layers in visual cortex and stacking them. Creating higher and higher stacks is not going to solve vision.

We are essentially building a frog with better and better visual perception in the hope that it could become a taxi driver. It will become a totally amazing super-frog with super-vision, but it's still just a frog with frog-like visual perception and limits. Using pre-attentive feature recognition stage equivalent for complex object recognition can fake human like object recognition when we force it, but it's wrong approach. We get these catastrophic failures because we hit the limits.

Features seem to exist independently from one another in the early processing stages of human perception. They are not associated with a specific object either. Human perception is not gradually turning features into objects like we do in deep learning. Properly distinguishing feature integration from detection and how to do it is a open question.

2 comments

And people will amaze at the totally super-froggy things these super-frogs can do, and understand even less why the super frogs aren't taxiing already. :-)
They actually are, but the self driving cars use that subsystem as only one component of the whole and most of it is not a super-frog.
You are making a lot of incorrect statements about brains and vision. I would advise you to study some visual neuroscience.

> Learning multilayer convolutional representations of statistical features is roughly equal to taking few first few layers in visual cortex and stacking them.

No, it isn't roughly equal the first few layers of visual cortex. The first few layers of visual cortex have substantial feedback connectivity from higher areas which affects the responses of even the most peripheral parts. (Citations in our arxiv preprint linked above.). Most of the brain has more feedback connectivity from elsewhere than feedforward ascending connectivity. This qualitatively affects activations.

>We are essentially building a frog...

I suspect frog vision is far more robust than anything we are "essentially building".

> Features seem to exist independently...

Please have a close look at some modern visual neuroscience. Or speak to an good honest electrophysiologist.

Which citations are you referring to? I would be grateful if you could please be specific.