|
|
|
|
|
by piaste
2605 days ago
|
|
> You could say that hash is a very aggressive, lossy compression. You could but it would be about as misleading as a sentence could possibly be. The purpose of compression is to preserve content as much as possible: similar inputs should give similar outputs, and the output should provide as much information about the input as it can. Hashing deliberately does the exact opposite - slightly different inputs should give wildly different outputs - as its primary purpose in the case of crypto hashes, and in the case of index hashes as a performance optimization (which is the primary purpose of index hashes). |
|
But you can have hash functions with the goal of preserving similarity. For example soundex is a hash function with that constraint. From: https://pdfs.semanticscholar.org/06d6/8587c27058dd6ab3fb8238...
> For example value 1 = "Damieva" and value 2 = "Dameiva." These two values will produce the same Soundex hash value, creating a match.
There's also the whole class of LSH https://en.wikipedia.org/wiki/Locality-sensitive_hashing