Hacker News new | ask | show | jobs
by hthh 3556 days ago
Dynamic resizing fixes this, although hash collisions can still be a problem (albeit a very unlikely one if your hash is secure). https://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing
3 comments

The number of collisions is only loosely based on the hash function, it's much more closely tied to the fill factor.

If you keep your hash table 1% full then you're unlikely to run into a collision no matter how bad your hash function is. If you keep your table 99% full then it doesn't matter how good your hash function is. You still have to modulo your hash by the number of buckets.

Yes, there are implemenations of hash tables that solve this problem. Hash collisions still ruin your worst case performance, but as long as nobody is asking how long you spend on inserts you can solve that too. As a trivial example, on each insert make the hashtable larger and change the hash function randomly until you found a solution that has O(1) lookup for all elements.

But now we're far outside the realm of "properties of hash tables" and instead inside "properties of my hash table", and any decent interviewer should recognize that.

I think people here seem to implicitly assume linked buckets, those are bad on modern architectures for several reasons.

Look at (Hopscotch,) Robin Hood or Cuckoo Hashing for hashing with linear probing, high fill factor (~0.9) and _amortized_ O(1). I've seen a paper somewhere that proved Robin Hood worst-case O(log log n) afair.

http://www.sebastiansylvan.com/post/robin-hood-hashing-shoul...

https://en.m.wikipedia.org/wiki/Hopscotch_hashing

You can make open probing performantly concurrent with 2 bytes of memory overhead per key/value too:

http://preshing.com/20160314/leapfrog-probing/

My first tought about dynamic resizing is that you can have an amortized O(1) but still is O(n) worst case. Both then exploiting the hash function to force all elements into the same bucket, but also when triggering a resize.

You can make an expanding array with O(1) operations, but I'm unsure if it's applicable to a hashtable. It may be, but it's beginning to become very complex.