| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pakl 2639 days ago

Deep convolutional networks, by design, are unable to integrate contextual and ambient information present in an image (or in preceding images) to inform how to interpret local features they use. So it's no surprise they struggle with unconstrained images. Images where ambient context varies.

It's intriguing how much focus there is on adversarial examples. You don't need adversarial examples to make a deep network fail - in a sense that's overkill. Just point the poor deep network at a sequence of images from the real world -- images from a self driving car, security camera, or webcam. You'll see it make spontaneous errors. No matter how much training data you gave it.

The field will advance when/if practitioners recognize that classifying pixel patterns in isolation isn't sufficient for robust visual perception, and adopt alternative neural network designs that can interpret what they perceive in light of (no pun intended) context and physical expectations.

It worked for our prototype.[0]

[0] https://arxiv.org/abs/1607.06854

5 comments

MAXPOOL 2639 days ago

Learning multilayer convolutional representations of statistical features is roughly equal to taking few first few layers in visual cortex and stacking them. Creating higher and higher stacks is not going to solve vision.

We are essentially building a frog with better and better visual perception in the hope that it could become a taxi driver. It will become a totally amazing super-frog with super-vision, but it's still just a frog with frog-like visual perception and limits. Using pre-attentive feature recognition stage equivalent for complex object recognition can fake human like object recognition when we force it, but it's wrong approach. We get these catastrophic failures because we hit the limits.

Features seem to exist independently from one another in the early processing stages of human perception. They are not associated with a specific object either. Human perception is not gradually turning features into objects like we do in deep learning. Properly distinguishing feature integration from detection and how to do it is a open question.

jacobush 2639 days ago

And people will amaze at the totally super-froggy things these super-frogs can do, and understand even less why the super frogs aren't taxiing already. :-)

AstralStorm 2639 days ago

They actually are, but the self driving cars use that subsystem as only one component of the whole and most of it is not a super-frog.

pakl 2638 days ago

You are making a lot of incorrect statements about brains and vision. I would advise you to study some visual neuroscience.

> Learning multilayer convolutional representations of statistical features is roughly equal to taking few first few layers in visual cortex and stacking them.

No, it isn't roughly equal the first few layers of visual cortex. The first few layers of visual cortex have substantial feedback connectivity from higher areas which affects the responses of even the most peripheral parts. (Citations in our arxiv preprint linked above.). Most of the brain has more feedback connectivity from elsewhere than feedforward ascending connectivity. This qualitatively affects activations.

>We are essentially building a frog...

I suspect frog vision is far more robust than anything we are "essentially building".

> Features seem to exist independently...

Please have a close look at some modern visual neuroscience. Or speak to an good honest electrophysiologist.

sgt101 2637 days ago

Which citations are you referring to? I would be grateful if you could please be specific.

ben_w 2639 days ago

What do you mean by “ambient“? If you hadn’t finished your comment with the words “our prototype” I would’ve assumed you meant things such as pictures of wolves having snow in them, and that snow being a clue that they are wolves, but I know that you can’t mean that.

gliop 2639 days ago

When you walk into a grocery store, you assume the fruit isn't plastic. When you walk into a furniture store, you do.

Why? Ambient context.

dec0dedab0de 2639 days ago

When you walk into a grocery store, you assume the fruit isn't plastic. When you walk into a furniture store, you do.

Why? Ambient context.

That was a really great way to get the point across, Especially because I still sometimes think it's real, even when I know the context.

ben_w 2639 days ago

That’s an example, not an explanation. From only that example, I cannot differentiate “ambient context” from “common sense”, which is a phrase that means totally different things to everyone who I’ve seen use it.

heyitsguay 2639 days ago

Very agreed with all this. I've been learning the same lessons working on more robust computer vision for biomedical imaging. I bet unsupervised predictive pretraining could be adapted to (static) 3d image volumes. The z axis replaces the t axis, and you predict the next 2d slice from previous ones. Hmm...

As an aside - from the paper it looks like you worked at Brain Corp a few years back. Any thoughts on them and what they're doing these days? I'll be looking for a job again soon and i see a lot of ads for them.

pjc50 2639 days ago

> classifying pixel patterns in isolation isn't sufficient for robust visual perception

This seems to be only a very small step forward from Minsky's negative result about "perceptrons".

AstralStorm 2639 days ago

That's because DNN are only a small step removed from multilayer perceptrons as well. (Few more layers, a tiny bit of internal structure, more advanced nonlinear activation function, better training schedule. Much more training data.)

They're not even close to structural or training algorithm complexity of natural neutral networks yet.

_0ffh 2639 days ago

That result was not about multilayer perceptrons, but perceptrons. But, whatever.

AstralStorm 2639 days ago

Multilayer perceptrons share many of the same problems single layer perceptrons have, such as trouble with high level structure and generating weird nonrobust features. They are much more nonlinear through and thus somewhat more powerful. (I'm imprecise here but it is easy to find papers on this ancient tech from before AI winter.)

DNN is essentially one of these with more layers than just typical 4 for MLP, because we figured out a way to propagate error and training gradients. (Plus a few important but interesting details.) They are not really qualitatively different according to math they use... The main difference is use of gated or not differentiable activation functions with various ways to compute approximate gradients when faced with this feature. Especially convolutional nets are similar to MLP.

marcosdumay 2639 days ago

It seems that we are finally at the point where throwing more hardware/data at a dumb algorithm won't give you much better results. This means that there will be space for smarts at AI again. And this is happening at the same time that throwing more money on general purpose hardware is stopping generating good results too, with great opportunities for synergy.

> The field will advance when/if practitioners recognize that classifying pixel patterns in isolation isn't sufficient for robust visual perception

But this, well, it is very clearly sufficient, and we have well accepted results showing this. It just won't work on practice. That probably means the change will be full of fighting while the old ways still work, and lots of failures and unexpected successes.

p1esk 2638 days ago

we are finally at the point where throwing more hardware/data at a dumb algorithm won't give you much better results

Recent success of gpt-2 indicates otherwise.