Hacker News new | ask | show | jobs
by barbs 2599 days ago
I feel like "BlurHash" is a misleading name, since it's not a hash of the image data that gets created, it's just highly compressed to just some data about the colours used. Unless I'm misunderstanding something?
2 comments

You could say that hash is a very aggressive, lossy compression.

Same image will result in same hash. Different images will often result in different hashes. That makes it a valid hash.

> You could say that hash is a very aggressive, lossy compression.

You could but it would be about as misleading as a sentence could possibly be.

The purpose of compression is to preserve content as much as possible: similar inputs should give similar outputs, and the output should provide as much information about the input as it can.

Hashing deliberately does the exact opposite - slightly different inputs should give wildly different outputs - as its primary purpose in the case of crypto hashes, and in the case of index hashes as a performance optimization (which is the primary purpose of index hashes).

If you want specifically a cryptographic/indexing hash, where 1 bit of change in input changes ~50% of bits in the output, then that's one possible goal/restriction. But that's just one kind of hash function.

But you can have hash functions with the goal of preserving similarity. For example soundex is a hash function with that constraint. From: https://pdfs.semanticscholar.org/06d6/8587c27058dd6ab3fb8238...

> For example value 1 = "Damieva" and value 2 = "Dameiva." These two values will produce the same Soundex hash value, creating a match.

There's also the whole class of LSH https://en.wikipedia.org/wiki/Locality-sensitive_hashing

From Wikipedia: "A hash function is any function that can be used to map data of arbitrary size onto data of a fixed size."

As for hash functions having the desired property of different inputs leading to vastly different outputs, that only describes one subset. There are many hash algorithms that aim to achieve precisely the opposite, such as simhash.

Technically speaking, a hash is just a function that maps arbitrary data into a fixed size set of outputs.

f(x) = 1 is a hash function. A catastrophically bad one to use for something like a hashmap, but a hash function nonetheless.

Although if you’re implementing a hashmap from scratch, that is a very good hash function to use temporarily while building collision management.