| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dahart 2296 days ago

Awesome!

> The table uses power-of-two sizes instead of prime numbers because pow2/AND masking is a fast single instruction and the modulo operator is much slower.

I was under the impression that the idea of prime number hash table sizes being faster was shredded a couple decades ago? Maybe by Bob Jenkins?

1 comments

amalcon 2296 days ago

The idea behind prime number sizes is that you can use a hash function with periodicity issues, which would normally cause collisions. E.g. in C++, it's totally standard-compliant for std::hash<size_t>::operator() to be the identity function. The prime number of buckets will tend to mitigate this.

These days, the preferred solution to this is usually to just use a better hash function without periodicity issues, so the bucketing no longer matters.

dahart 2296 days ago

Exactly. Indeed - I found the link I was thinking of - Bob Jenkins wrote about that in his famous 1997 article on hashes in Dr Dobbs. Having learned in college that hash table sizes are ideally prime, reading this article was shocking at the time. Ironically, I learned that hash table sizes should be prime in college after 1997, and didn’t find this article until later. I guess best practice knowledge takes time to percolate.

“The best hash table sizes are powers of 2. There is no need to do mod a prime (mod is sooo slow!). If you need less than 32 bits, use a bitmask. For example, if you need only 10 bits, do h = (h & hashmask(10));”

https://burtleburtle.net/bob/hash/doobs.html