Hacker News new | ask | show | jobs
by smitherfield 3180 days ago
Perhaps I'm missing something, but wouldn't the most efficient possible hash function for a set of unique integers be identity? So I don't see how an ID column would benefit.
1 comments

You can use such a hashfunction, but it'd not be a good one for general purpose. Remember that hashtables, and also our hash-indexes, use buckets / ranges of hash values to keep the hash-table at reasonable sizes. If you don't use a hash function that has a good bit perturbation, you can end up with a lot of values in the same bucket.

Consider e.g. the common implementation where buckets are determined by masking out either the lowest or the highest bits of the hashvalue. If you mask out the highest bits and your values aren't sequential (pretty common), you end up with a lot of collisions. More extremely, if you mask the low bits and shift, if you only have small values everything ends up in the first bucket. Therefore what you want is a hashfunction where a one bit change at "one side" of the input value, is likely to affect most of the remaining bits.