Hacker News new | ask | show | jobs
by tylerneylon 1272 days ago
The use case I see the most in my career is to use LSH to help solve the "ANN" problem = approximate nearest neighbors (typically with ranked results). I've seen ANN used many times for near-duplicate detection and in recommendation systems.

Although I don't have access to the proprietary code used, it's most likely that an LSH algorithm is behind the scenes in every modern search engine (to avoid serving duplicates), many modern ranking systems such as Elasticsearch (because items are typically vectorized and retrieved based on these vectors), and most recommendation systems (for similar reasons as ranking). For example, all of these pages probably have an LSH algorithm at some point (either batch processing before your request, or in some cases real-time lookups):

* Every search result page on Google * Every product page on Amazon (similar products) * All music suggestions on Spotify or similar * Every video recommendation from TikTok, YouTube, or Instagram

etc.

2 comments

Yes, e.g. many IR systems use cosine similarity to compute query vector/term vector similarity, and simhashing approximates cosine similarity. OTOH, some IR systems instead use a set-theoretic measure, Jacquard similarity, which can be approximated by minhashing.