Hacker News new | ask | show | jobs
by mannykannot 2788 days ago
That raises the interesting question of how object recognition in streams of images is progressing, beyond being just object recognition within the individual frames. Humans are capable of extracting a lot of additional information in such situations, and are actually helped when the perspective on a given object changes. One cannot give current machine vision a pass if, through lacking this capability, it is under-performing.

And moving one's head to get a a better view is only one thing that a human might do. Firstly, of course, we must recognize that we are having a difficulty, and current machine vision seems to be somewhat deficient in this regard. Then, even without being able to get a different perspective, we will do things like make guesses as to what might be there (using our extensive semantic models of the world) and figure out if they might be a good fit to what we see, and/or we might try to extract specific features of the problematic area and search our memories for objects that might plausibly match, bearing in mind that it might be from a different perspective than we are accustomed to. We are also quite good at estimating whether an object might be a problem for us, even if we have not positively identified it. There is a lot more to it than just moving one's head.