Hacker News new | ask | show | jobs
by scentoni 1115 days ago
Those are examples of useful functions. They are not examples of useful hash functions.
3 comments

I think you're conflating "functions used to build a hash table" with "functions used to build a dictionary ADT." A hash table doesn't have to serve the purpose of storing items in a dictionary ADT. A hash table where everything that's similar collides into the same buckets is just as useful for finding "a linked list of similar items" in O(1), as a hash table where everything gets its own unique bucket is useful for finding items in O(1).
It's true that these functions are in some sense very different to traditional hash functions, because they are designed to create "collisions" (or at least be similar) for elements that are similar. But I still think it is intuitive to use the term, maybe qualified as a "similarity hash" (I've also seen "perceptual hash" or "semantic hash"). In both cases we are describing a (potentially) large object with a short code. In normal hashes, the hash is supposed to be the same for objects that are exactly the same - in similarity hashes, that is softened to similar hashes for similar objects.
Functions that create collisions can still hashes can't they?

Things like CRCs, Checksums are loosely considered hash functions: https://en.wikipedia.org/wiki/List_of_hash_functions

MD5SUM is a classic example of a hash function with a low (but still reasonable) chance of a collision

Specifically, these functions to not provide "uniform distribution".
That isn’t a requirement of an algorithm called a hash function
From the article:

> For a function to be useful as a hash function, it must exhibit the property of uniform distribution

It's also listed as the first property on the Wikipedia page:

https://en.wikipedia.org/wiki/Hash_function#Uniformity

I can’t find the text you quoted. The article starts off:

> A hash function is any function that can be used to map data of arbitrary size to fixed-size values

And the first sentence in the section you link says

> A good hash function should map the expected inputs as evenly as possible over its output range.

It doesn’t have to be “good” to be a hash function.

That's not a meaningful definition of a hash function, then.

    def hash(val):
        return 0
Yes, this is a hash function. Whether it’s “useful” or “good” is the purview of the architect.

Tangential edit: https://xkcd.com/221/

That has a pretty uniform distribution..