| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xerophtye 4450 days ago

>I don't need to have someone caption the image for me to understand the broad range of situations to which the caption "jump" applies.

I wonder if this has anything to do with the fact that we can jump too. That we can translate the frog's position into something we do as well.

of course one can argue that we can do the same for non-anthro-moprhic things as well. What i think is that we dont directly relate pictures, as the software is taught. What we do is translate that 2D picture into something we'd see in the 3d world. And that 3d "vision" isn't just another image. It represents an object in our world. something that has shape, existence etc. something which we can observe from other senses as well. For us a picture doesn't always represent an abstract thing, an arbitary pattern of colours. It usually represents something concrete. Something about which we have tons of other pieces of knowledge as well.

So we relate pictures by checking if they map to the same real-world object. And here that "object" is a sort of nexus of many pieces of information we have on it which is a product of many direct and indirect human experiences.

So i don't really think that we are in a position to teach a computer to do anything like that.