|
|
|
|
|
by blrs
5359 days ago
|
|
This seems like SIFT (http://en.wikipedia.org/wiki/Scale-invariant_feature_transfo...) in a nutshell. Here’s how image recognition works in a nutshell. It starts with identifying points of interest in an image — the points, lines, and patterns that provide sharp contrasts or really stick out from a bland, featureless background. It’s similar in some ways to how the human eye picks out edges and points by keying off the places where there’s sharp contrast. Then it looks at how these points are related to each other — the geometry of the whole set of points. You could picture it as looking like a constellation of stars, even though really it’s a more sophisticated mathematical model of these points of interest and how they relate. Now it compares that model to all the other models in a huge database. Those other models come from images it has already analyzed from around the web. It looks for a matching model, but it doesn’t have to be a perfect match. In fact, it’s important that it be a bit flexible, so it doesn’t matter if it’s turned around, or shrunken, or twisted a bit. The Taj Mahal still has the basic geometry of the Taj Mahal even if you photograph it from a little bit of a different angle or photograph it lower in the frame. When Google recognizes that it matches that model best, it guesses it’s probably the Taj Mahal. |
|
They do use SIFT (or at least a variant thereof) for finding and describing interest points, but by itself, there is no geometric matching in SIFT. There are various competing approaches on how do it, although in many cases, you can get very good results even without it. (It's very slow to do geometric matching so people often skip that step, or only apply it to the best matches.)
Landmark detection is a recent "hot topic" in computer vision, and given a large enough dataset, it essentially works now for the most part.