| HN Mirror

So when you run SIFT on an image, one gets dozens (maybe hundreds) of SIFT features back. The trouble with SIFT features is that each individual SIFT feature is a local image descriptor -- it describes a single point in the image. One can't just append the two lists of SIFT descriptors together and do a Hamming comparison on them, because it's not guaranteed that both images will have all of the same SIFT descriptors, nor that they would be in the same order. When you want to do image comparison on image descriptors, one must compare every local feature with every local feature in every other image. This is great for comparing two images together, or for finding where one image is located in another image (homography matching), but this does not scale for large image sets.

In contrast, descriptors like perceptual hashes look at the entire image, and so are a _global_ image descriptor.

There are ways to convert local SIFT image descriptors into a single global image descriptor for doing more rapid lookup (Bag of Visual Words is one technique that comes to mind), but SIFT and pHash really are in two categories all their own.

More info on pHash: https://hackerfactor.com/blog/index.php%3F/archives/432-Look...

Example of SIFT for fine-grained image matching: https://docs.opencv.org/3.4/d1/de0/tutorial_py_feature_homog...