Hacker News new | ask | show | jobs
by 7steps2much 1618 days ago
You are correct, at least in theory.

Assume you have two inputs, A and B.

A may hash to: A38uT75kjGz B may hash to: A38uHso629t

So yes, if you were to cut these off after the A38u then you would no longer be able to say for sure if you hashed A or B to arrive at your hash.

Of course in practice this usually isn't a problem as long as you have "a long enough" output.

1 comments

Your example made me wonder, is there a known instance from a common hash algorithm where the input results in exactly the same string representation of the output hash? Eg. "AE485D" hashes to "AE485D". Is this even mathematically possible?
The mathematical term for this is a "fixed point", where f(x) == x.

Assuming a perfectly random uniform distribution, the usual desirable property of a cryptographic hash - the probability of a hash function not having a fixed point (that is, hashing at least one x to itself) is (1-1/n)**n, where n is the number of possible outputs. As n approaches infinity - which it does pretty rapidly in this case, since we're talking about 2**32 to 2**512 in practice - this approaches 1/e, or about 37%.

So, not only is it possible, but most "good" hash functions (63% of them) will have them.

Java’s built-in hash function for integers is the identity function.
The modulus function has this property.