The use case I see the most in my career is to use LSH to help solve the "ANN" problem = approximate nearest neighbors (typically with ranked results). I've seen ANN used many times for near-duplicate detection and in recommendation systems.
Although I don't have access to the proprietary code used, it's most likely that an LSH algorithm is behind the scenes in every modern search engine (to avoid serving duplicates), many modern ranking systems such as Elasticsearch (because items are typically vectorized and retrieved based on these vectors), and most recommendation systems (for similar reasons as ranking). For example, all of these pages probably have an LSH algorithm at some point (either batch processing before your request, or in some cases real-time lookups):
* Every search result page on Google
* Every product page on Amazon (similar products)
* All music suggestions on Spotify or similar
* Every video recommendation from TikTok, YouTube, or Instagram
Yes, e.g. many IR systems use cosine similarity to compute query vector/term vector similarity, and simhashing approximates cosine similarity. OTOH, some IR systems instead use a set-theoretic measure, Jacquard similarity, which can be approximated by minhashing.
Here's an example I can think of. Suppose you have a bunch of text documents, and you know that some documents are similar but not identical (e.g. plagiarized and slightly modified). You want to find out which documents are similar.
You can first run the contents through some sort of embedding model (e.g. the recent OpenAI embedding model [1]), and then apply LSH on those embeddings. The documents that have the same LSH value would have had very similar embeddings, and thus very similar content.
Collision detection in games. This problem is O(n^2) because you have to check every object against every other object.
You can almost only check objects that inhabit the same buckets (there are caveats, usually neighboring buckets are also checked), eliminating objects that couldn't possibly collide by virtue of e.g. being on the other side of the map. Of course this is still O(n^2) because every object could be in the same bucket (but that's unlikely).
Geohash is one use case. Latitude and longitude address is converted to a geohash which has the nice property of nested rectangles have the same hash prefix.
Although I don't have access to the proprietary code used, it's most likely that an LSH algorithm is behind the scenes in every modern search engine (to avoid serving duplicates), many modern ranking systems such as Elasticsearch (because items are typically vectorized and retrieved based on these vectors), and most recommendation systems (for similar reasons as ranking). For example, all of these pages probably have an LSH algorithm at some point (either batch processing before your request, or in some cases real-time lookups):
* Every search result page on Google * Every product page on Amazon (similar products) * All music suggestions on Spotify or similar * Every video recommendation from TikTok, YouTube, or Instagram
etc.