Hacker News new | ask | show | jobs
by mannykannot 2775 days ago
I think we do actually know enough about how the human mind does process images to have some idea of what is different. It is not that uncommon for humans to be uncertain about what they are looking at, but the first thing about such occurrences is that the human is usually aware of the fact that they are having a problem, and the second thing is that they take steps to resolve it, such as making hypotheses as to what's going on and checking them out, and/or seeking to get a better view (or other evidence) in a way that is specifically designed to resolve the uncertainty. It is this higher-level semantic analysis that is missing from current image processing software.

In these discussions, someone always mentions optical illusions, but only humans (so far) understand the concept of 'optical illusion', and recognize that they are experiencing them.

2 comments

> It is not that uncommon for humans to be uncertain about what they are looking at, but the first thing about such occurrences is that the human is usually aware of the fact that they are having a problem, and the second thing is that they take steps to resolve it

This is true, but step one is "move your head" (or in your words, "get a better view" -- but you get more value from just the fact that your head is in a different place than from the possibility of a better angle on whatever you're looking at).

That strategy doesn't work at all when you're trying to classify static images rather than physical objects.

That raises the interesting question of how object recognition in streams of images is progressing, beyond being just object recognition within the individual frames. Humans are capable of extracting a lot of additional information in such situations, and are actually helped when the perspective on a given object changes. One cannot give current machine vision a pass if, through lacking this capability, it is under-performing.

And moving one's head to get a a better view is only one thing that a human might do. Firstly, of course, we must recognize that we are having a difficulty, and current machine vision seems to be somewhat deficient in this regard. Then, even without being able to get a different perspective, we will do things like make guesses as to what might be there (using our extensive semantic models of the world) and figure out if they might be a good fit to what we see, and/or we might try to extract specific features of the problematic area and search our memories for objects that might plausibly match, bearing in mind that it might be from a different perspective than we are accustomed to. We are also quite good at estimating whether an object might be a problem for us, even if we have not positively identified it. There is a lot more to it than just moving one's head.

GP's statement applies as much to observing objects in 3D space as it does to looking at photos, where just moving your head ain't gonna help you much. Optical illusions are great to study this process, because most of them are delivered in form of flat images on paper or computer screen.
Optical illusions are delivered as flat images because moving your head doesn't affect those.
Humans are rarely aware of optical illusions unless they're extreme images they don't see in real life - crawling dots, impossible geometry - or they're explicitly labelled as optical illusions.

Some more subtle examples:

http://www.terrycolon.com/1features/optical-illusions.html

In fact human perceptual processes are only kind of reliable some of the time. Low and/or unusual light, suggestibility, and unusual contexts all have a very negative effect on reliability, but humans are often unaware of this.

Cognitive and semantic illusions are even more persistent. People literally believe all kinds of nonsense, and will carry on believing it even when offered robust evidence that they're wrong.

The point being that human perception and cognition are not some kind of gold standard. They have plenty of issues of their own. But there's a kind of assumption/requirement of perfection with machine intelligence that doesn't apply to human cognition. So bugs in our own evolutionary firmware tend to be overlooked, while equivalent-level bugs in ML are seen as terrible failures which undermine the entire premise of AI.