| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hthh 3556 days ago
	Dynamic resizing fixes this, although hash collisions can still be a problem (albeit a very unlikely one if your hash is secure). https://en.wikipedia.org/wiki/Hash_table#Dynamic_resizing

3 comments

mason55 3556 days ago

The number of collisions is only loosely based on the hash function, it's much more closely tied to the fill factor.

If you keep your hash table 1% full then you're unlikely to run into a collision no matter how bad your hash function is. If you keep your table 99% full then it doesn't matter how good your hash function is. You still have to modulo your hash by the number of buckets.

link

wongarsu 3556 days ago

Yes, there are implemenations of hash tables that solve this problem. Hash collisions still ruin your worst case performance, but as long as nobody is asking how long you spend on inserts you can solve that too. As a trivial example, on each insert make the hashtable larger and change the hash function randomly until you found a solution that has O(1) lookup for all elements.

But now we're far outside the realm of "properties of hash tables" and instead inside "properties of my hash table", and any decent interviewer should recognize that.

link

lorantt 3556 days ago

I think people here seem to implicitly assume linked buckets, those are bad on modern architectures for several reasons.

Look at (Hopscotch,) Robin Hood or Cuckoo Hashing for hashing with linear probing, high fill factor (~0.9) and _amortized_ O(1). I've seen a paper somewhere that proved Robin Hood worst-case O(log log n) afair.

http://www.sebastiansylvan.com/post/robin-hood-hashing-shoul...

https://en.m.wikipedia.org/wiki/Hopscotch_hashing

You can make open probing performantly concurrent with 2 bytes of memory overhead per key/value too:

http://preshing.com/20160314/leapfrog-probing/

link

hvidgaard 3553 days ago

My first tought about dynamic resizing is that you can have an amortized O(1) but still is O(n) worst case. Both then exploiting the hash function to force all elements into the same bucket, but also when triggering a resize.

You can make an expanding array with O(1) operations, but I'm unsure if it's applicable to a hashtable. It may be, but it's beginning to become very complex.

link