Hacker News new | ask | show | jobs
by bartvbl 1763 days ago
LSH is a really neat algorithm, but to my understanding (at least what I’ve seen in literature), it also tends to be rather inefficient. For it to have good precision, you need longer hashes, but that reduces recall. It also does not really tend to produce a well balanced distribution of entries over buckets. More current research has therefore focused on more elaborate hashing functions that are capable of producing shorter, and better balanced hash maps.

The article is well put together and nicely illustrated, though :)

1 comments

There are a lot of various ways of improving efficiency such as using a HLL data structure (see hyperminhash).