|
|
|
|
|
by pornel
493 days ago
|
|
Yikes. It's valid to assume "it will never happen" for 128 bits or more (if the hash function isn't broken) since chance of a random collision is astronomically small, but a collision in 64 bits is within realm of possibility (50% chance of hitting a dupe among 2^32 items). |
|
The birthday paradox is a thing. If you have 128 bits of entropy, you expect the 50% mark to be proportional to 64-bit keys, not 128 bits. 64 bits is a lot, but in my current $WORK project if I only had 128 bits of entropy the chance of failure any given year would be 0.16%. That's not a lot, but it's not a negligible amount either.
Bigger companies care more. Google has a paper floating around about how "64 bits isn't as big as it used to be" or something to that effect, complaining about how they're running out of 64-bit keys and can't blindly use 128-bit random keys to prevent duplication.
> bits of entropy
Consumer-grade hash functions are often the wrong place to look for best-case collision chances. Take, e.g., the default Python hash function which hashes each integer to itself (mod 2^64). The collision chance for truly random data is sufficiently low, but every big dictionary I've seen in a real-world Python project has had a few collisions. Other languages usually make similar tradeoffs (almost nobody uses crytographic hashes by default since they're too slow). I wouldn't, by default, trust a generic 1-million-bit hash to not have collisions in a program of any size and complexity. 128 bits, even with low enough execution counts to otherwise make sense, is also unlikely to pan out in the real world.