Hacker News new | ask | show | jobs
by _flag 4402 days ago
Summary of the paper for those who don't want to read it:

So basically there are two categories of "learning" involved in this sort of research, supervised and unsupervised. In supervised learning, someone gives the computer a long list of concepts and their attributes ("frog", "green frog", "jumping frog") and a set of pictures to go with each item, and feeds them into a visual-recognition algorithm. In unsupervised learning, the computer is given a concept like "frog" but then has to discover all the variations itself and get its own visual data to match.

The claim in this paper is that they have made the unsupervised learning as strong as the supervised learning. That is, they give the computer a concept ("frog"), it goes and searches through Google Books for common variations ("green frog", "jumping frog") and then uses Google image search to fetch images for each of those queries. They can then remove the obvious false positives (they test to see which images seem to screw up their learning algorithm and leave those out), and the result they get is on par with the supervised learning methods.

----------------------

In my opinion, this is only mildly interesting because Google Image Search functions based on human input anyway -- Google knows the difference between a "frog" and a "jumping frog" or even a "camel" simply because people on the internet caption such images and Google can make associations between images and their captions. Essentially, what the researchers have managed to do is outsource the work of some grad student to millions of people around the world through Google.

Of course, it could be argued that there is some sort of parallel with what humans actually do (we know what things are called because we hear other people call them that), but even if I didn't know the name of an animal I could still tell you when the same animal is in different pictures, and I can also tell you when it's jumping and what colour it is. I don't need to have someone caption the image for me to understand the broad range of situations to which the caption "jump" applies.

2 comments

>I don't need to have someone caption the image for me to understand the broad range of situations to which the caption "jump" applies.

I wonder if this has anything to do with the fact that we can jump too. That we can translate the frog's position into something we do as well.

of course one can argue that we can do the same for non-anthro-moprhic things as well. What i think is that we dont directly relate pictures, as the software is taught. What we do is translate that 2D picture into something we'd see in the 3d world. And that 3d "vision" isn't just another image. It represents an object in our world. something that has shape, existence etc. something which we can observe from other senses as well. For us a picture doesn't always represent an abstract thing, an arbitary pattern of colours. It usually represents something concrete. Something about which we have tons of other pieces of knowledge as well.

So we relate pictures by checking if they map to the same real-world object. And here that "object" is a sort of nexus of many pieces of information we have on it which is a product of many direct and indirect human experiences.

So i don't really think that we are in a position to teach a computer to do anything like that.

> I don't need to have someone caption the image for me to understand the broad range of situations to which the caption "jump" applies.

I'm more interested in metaphor and analogy.

My 3.5 year old son said "look at the rain! It is bouncing like hopping frogs!"

I don't know if he created that. It's not in any of his books. I guess he jumps like a hopping frog at nursery and transferred that to rain.

I'm not so interested in a computer that is trained on frogs, and which sees a hopping frog and describes it as such. If it saw a hopping cat and said this thing is hopping but I don't know what it is, then I'd be interested.

Am I being too harsh on the robots?