| HN Mirror

That raises the interesting question of how object recognition in streams of images is progressing, beyond being just object recognition within the individual frames. Humans are capable of extracting a lot of additional information in such situations, and are actually helped when the perspective on a given object changes. One cannot give current machine vision a pass if, through lacking this capability, it is under-performing.

And moving one's head to get a a better view is only one thing that a human might do. Firstly, of course, we must recognize that we are having a difficulty, and current machine vision seems to be somewhat deficient in this regard. Then, even without being able to get a different perspective, we will do things like make guesses as to what might be there (using our extensive semantic models of the world) and figure out if they might be a good fit to what we see, and/or we might try to extract specific features of the problematic area and search our memories for objects that might plausibly match, bearing in mind that it might be from a different perspective than we are accustomed to. We are also quite good at estimating whether an object might be a problem for us, even if we have not positively identified it. There is a lot more to it than just moving one's head.