|
Human vision works well because our brain has an incredible quantity of priors to guide it, that is similar past experiences that explain most of what we are seeing. When your eyes see something, only a small amount of information is passed to your brain (like motion, for example). Your brain "fills up" the missing pieces with what he's used to. That's why we don't see the blind spot created by our optical nerve entrance in the eye and often miss things that are hiding in plain sight without motion. Illusions are due to the deception of our vision priors. Our brain expects something, makes you see something this way, but it's not what is happening in reality (maybe because it was engineered to deceive those expectations, as the images in this thread link). This is because the mental model we have of a standard "sight" doesn't model well those examples, our brain was not trained to work this out, I guess because it has no advantages in doing so (in terms of evolution or learning as a kid). Our brain is only trained to extract information efficiently on "plausible images" (lit by sun-like light, taken on the earth, etc.), you can't feed it random noise or it will try to explain it with things it knows (which is called Pareidolia). In machine learning vision, we re-learn, usually from scratch (or fine-tune), at each experiment. This generates (or modify) the priors learned. Think of the priors as the "default(s)" image (in terms of complex internal representation, not in terms of pixels) that helps you think about the problem at hand. If you have a motion detection/tracking problem, this optimal default information representation will be different from the default information most useful for classifying or segmenting. What I want to say with those examples is that machine learning computer vision is prone to illusions, that is images that defeat (are too far away, or not well explained by) its internal representation space and/or default representation. Also, each algorithm (let it be neural networks, SVM, or anything, really) has a different internal representation, so different images will be illusions for them. An illusion for one model won't necessarily be an illusion for another one. The thing is, we are far from mastering advanced machine learning, in the sense that we don't have optimality proofs for capacity, architecture and filters on deep neural networks for a given task, for example. There's a lot of recent research on those illusions--for example, adversarial examples or networks. It seems to indicate that those illusions are far from human vision illusions and seems to be due to the mathematical nature of machine learning, for example adding small noise (sometimes with a lower magnitude than the smallest representable value by standard images formats!) to a correctly classified image can result in a wrong and very certain prediction. The most proeminent viral example of this on the internet was the school bus becoming with high certainty an ostrich after some small noise was added to the image. Other examples can be found in the introduction of [1]. [1] https://openai.com/blog/adversarial-example-research/ |