|
|
|
|
|
by addictedcs
1815 days ago
|
|
For binary vectors you can choose a different distance metric (not geometric one, i.e. Jaccard) that can be used to effectively hash similar data points into similar buckets. Treating your binary vector as a set allows you to use min-hashing as your LSH schema (min-hashing is just a random permutation of the given set). This simple trick makes LSH with min-hashing quite a powerful tool for binary vectors that are extensively used in recommenders systems and other domains. I've used LSH + Min-Hash for image search (and subsequently for audio fingerprinting). If interested, I've blogged about it here [1]. [1] - https://emysound.com/blog/open-source/2020/06/12/how-audio-f... |
|