| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Jtsummers 2090 days ago

Related: pigeonhole principle.

If you have n buckets, and n+1 items you're guaranteed to have a shared bucket.

In the case of hash algorithms you're taking an arbitrary sized input and "compressing" (in quotes because this is one way, you can't decompress it because of collisions) into a fixed size. If you permit more inputs to your hash function than there are hash values, then you will eventually have a collision.

A stupid awful hash function: n mod 100

So long as n is less than 100, you will never have a collision. But as soon as you compute the hash of n = 100, you will get a collision (with 0 in this case). Now, real world hash algorithms have larger spaces they map to, and more complicated mappings, but they all have this same problem. The larger the space (like 256-bits versus 64-bits) the less likely collisions become, but it could still happen.