Hacker News new | ask | show | jobs
by varelaz 1704 days ago
I don't know if it makes sense to query hamming distance for hash. Closest hashes don't guarantee closest images at all. You can check for amount of parts matching by query like: select video_id from video_hashes where hash in (...) group by video_id order by count(distinct hash) desc limit 10

technically it can be fast since selection on hash could be very narrow. You need only index by hash, video_id.

1 comments

OP is referring to phashes aka perceptual hashes, where closest hashes should indeed indicate similarity.