| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by im3w1l 1520 days ago

One way to think of hash functions is that the hash of any value is basically a random number. So we can consider what happens if the results were actually random, then what would be the probability of a collision?

Consider a function from [0, 2^256 - 1] to [0, 2^256 - 1]. That is it maps 256bit numbers to 256bit numbers. It could in theory be represented as an enormous table lookup.

Now how many such functions are there? Well there are 2^256 ways to map 0. And then 2^256 ways to map 1.. etc. We have 2^256 inputs we need to deal with each of which can give one of of 2^256 results. That turns into (2^256) ^ (2^256)

Now, how many collision free functions are there? In this case there are 2^256 ways to map 0, but then we have to pick a different number for 1, so there are only 2^256 - 1 possiblites. Then 2^256 - 2 etc... It becomes (2^256)! where the exclamation mark is factorial.

So the probability of no collisions is (2^256)! / [(2^256) ^ (2^256)]. It may not be obvious but that's a very small number. A little bit of intuition: let's say we fill out the values of our gigantic table by hand. Let's assume that by some miracle we filled out the first half without creating any collisions. Now that means we used up half the possible outputs. So every cell we fill out from now we have a 50/50 of creating a collision. And the further we get the more numbers are used up.

Now sha256 is not a randomly chosen function. But without any evidence either way, we might suspect it does have collisions.