Hacker News new | ask | show | jobs
by inciampati 1821 days ago
Locality sensitive hashing is extensively used in bioinformatics and genomics.

mash is probably the best-known implementation (https://doi.org/10.1186/s13059-016-0997-x). It allows you to do highly efficient pairwise comparisons of genome sequences, where it provides a good approximates their actual pairwise sequence divergence.

Is MinHash used industrially outside of genomics? In theory, it'd be useful on any kind of text corpus. The design of the hash functions that the sketch is built from is non-trivial The hashes typically come from k-mers of a fixed length and various fast hash functions over them.