Hacker News new | ask | show | jobs
by amalcon 2296 days ago
The idea behind prime number sizes is that you can use a hash function with periodicity issues, which would normally cause collisions. E.g. in C++, it's totally standard-compliant for std::hash<size_t>::operator() to be the identity function. The prime number of buckets will tend to mitigate this.

These days, the preferred solution to this is usually to just use a better hash function without periodicity issues, so the bucketing no longer matters.

1 comments

Exactly. Indeed - I found the link I was thinking of - Bob Jenkins wrote about that in his famous 1997 article on hashes in Dr Dobbs. Having learned in college that hash table sizes are ideally prime, reading this article was shocking at the time. Ironically, I learned that hash table sizes should be prime in college after 1997, and didn’t find this article until later. I guess best practice knowledge takes time to percolate.

“The best hash table sizes are powers of 2. There is no need to do mod a prime (mod is sooo slow!). If you need less than 32 bits, use a bitmask. For example, if you need only 10 bits, do h = (h & hashmask(10));”

https://burtleburtle.net/bob/hash/doobs.html