Hacker News new | ask | show | jobs
by zby 2854 days ago
"A human sees a 3D scene objects, creatures, texture and lighting (and it evaluates the scene based on these concepts and how they related to each other even if it's never seen green fields, sheep, dry stone walls or fog before)."

Our eyes are no that different from cameras - they also have a set of pixels that can get some values, they are not that regular and maybe the values are not that discrete but it is not that retina sees objects or textures - there are just some neural layers that do the pixels->objects computation.

4 comments

That's actually not at all how the eye works. We saccade a tiny spot around the scene based on our semantic intent (it's how, for example, you can see a hole to the sunny outside from inside a cave, while no camera will be able to manage the white balance). Then we have specific hardware doing feature extraction in the early visual system and feeding into the vision process.

Finally the semantic interpretation feeds deeply into the vision system. For example, though we have binocular vision you can only get stereopsis via parallax basically as far as you can reach -- after that you use semantic clues in the scene to understand that a barn is bigger than a person so that the person must be closer.

> Our eyes are no that different from cameras

Our eyes are plenty different. One above all they are driven by the neurons behind to scan the scene as the brain tries to figure out the details whereas neural network take whatever feed the camera captures passively.

See for example an owl head movements as it’s triangulating a prey’s distance.

There’s a lot more going on than just the vision part like a cascade of neural structures and not just a big uniform net, with region dedicated to detecting edges and understanding depth separated from and feeding into the classification region.

And we have structures to pick up differences from one scene to another somewhere, and dedicated neurons that react to changeand movement in a scene independently from the brain classification

Oh and it is also apparent that some superstructure does innate detection and supercedes learning, i.e. tests say mammals scared by serpents even if they were never exposed to one, while the same doesn’t happen with spiders, hinting serpent detection and fear is hardwired and not learned. Or ar least learned by evolution and not brain neurons’ plasticity.

The part about the movements - agreed (maybe we need to add this to the machines to improve them) - but the rest is just about additional layers - machines are not restricted to just one layer neither.
Yeah, obviously human visual inputs are a finite set of data points from rods and cones which might be considered roughly akin to pixels. But by "seeing" I'm clearly referring to what takes place in the visual cortex which is incredibly efficient at converting those inputs to geometry and objects/creatures/expressions with qualitative associations in a lossy manner, making heavy use of hardwired priors which are evolved rather than learned through evaluation against past sensory input (whilst at the same time apparently being entirely incapable of processing or storing the original sensory input values in a sufficiently discrete manner to replicate the pixel by pixel evaluation a computer vision system can achieve).
Nope, we gave two eyes and the ability to change focus. This essentially means we are dealing with video with added depth perception. Recent motion really pops out because we are comparing what we see with what we just saw.

Self driving cars with liar are a much better representation of human vision than a single image. We also do well with photos, but that’s a significant step down.