Hacker News new | ask | show | jobs
by ademarre 3874 days ago
I accidentally upvoted your comment, but I meant to downvote. Sorry.

If fear of collisions is the reason you don't use a crypto hash function, then you are using a broken hash function.

A hash doesn't satisfy all the requirements of the problem, but it's not because of collisions.

2 comments

Your reasoning behind downvoting is totally unfounded. Collisions may not be the primary reason for not using a hash function, but to discount the fact that it results in an unreliable equality operator is just ignorant.

This fact applies to any hash function even those you don't consider broken.

I don't understand. Aren't collisions why a hash of any kind isn't sufficient for proving equality of inputs?
A hash function with a fixed output length and arbitrary input length necessarily implies that collisions exist. So in a very strict sense, you are correct that equality of hash output doesn't prove equality of input. But that's much too limiting for the real world.

As optimiz3 already pointed out, a critical property of crypto hash functions is extreme difficulty in finding collisions, let alone meaningful ones. So if a hash function passes muster cryptographically, then we can use it to practically assert equality of inputs.

In other words, it's so improbable that we treat it as if it's impossible. When that assumption is not safe to make anymore, that's when we upgrade to a new hash function.

I very strongly disagree with the premise of your thinking. The reason hash collisions do not matter in practice is due to the fact that cryptographically strong hash functions have preimage resistance, making it robust to attacks. However it is an extremely reasonable point that a small probability of error in determining equality is much more of a concern. Imagine if your '==' operator only worked 99 times out of a hundred. Your emphasis on practicality is unfounded in this context.
You are dramatically undervaluing collision resistance.

> Imagine if your '==' operator only worked 99 times out of a hundred.

Imagine if the probability of your '==' operator failing was 2^(-128). I would be fine with that. And that's the same "gamble" made by any protocol that relies on the collision resistance of SHA-256. You would have to try 1 quadrillion per second for 10 quadrillion years before having a 50% chance of hitting a collision.

Your reasoning is incorrect due to the fact that the probability is simply for the average case.

Just because the chance of it happening is small, does not mean you will have to try 1 quadrillion per second for 10 quadrillion years before a collision occurs.

Granted collisions may not occur frequently, but a collision could occur at any time.

FYI, procedures like these would not pass the FAA code regulations for airlines.

Let's remember the context of this discussion. We are talking about hash functions in cryptographic protocols. No one is proposing that you use a hash function to test the equality of two values when you're not constrained to the parameters of some cryptographic problem. E.g. when the two values you want to check are known.

> "Granted collisions may not occur frequently, but a collision could occur at any time."

Your reasoning sounds something like this: "The risk of a collision is greater than zero, therefore you should worry about collisions."

Which is like saying: "The risk of being hit by a meteorite is greater than zero, therefore you should worry about meteorites."

The problem with such reasoning is that it ignores probability.

> "Just because the chance of it happening is small, does not mean you will have to try 1 quadrillion per second for 10 quadrillion years before a collision occurs."

You're right! It means that you would have to try 1 quadrillion per second for 10 quadrillion years before having a 50% chance of hitting a collision.

If you don't like that 50% number, then let's use 10^(-15); a probability of 1 in 1 quadrillion. According to the birthday problem [0], and using the same collision resistance (2^128) and hash rate (1 quadrillion per second) from my previous example, you would have to work for 475 million years before having a 10^(-15) probability of hitting a collision.

Accidental collisions simply are not a realistic problem to worry about. I don't believe that any such collisions are known to have ever occurred in the wild with modern crypto hash functions.

Denying collision resistance renders hash functions almost useless, even for protocols where pre-image resistance is the key property. Think about that. Even if you were confident that a hash input wasn't crafted to conduct a pre-image attack, but you were concerned about collisions, then you should mistrust the hash result simply because it may have been an accidental collision.

I'm sorry you disagree with my reason for downvoting your initial comment. But I did so (or tried to) because (1) I feel it was not a correct answer to the question, and (2) it contained misinformation by implying that the likelihood of collisions is much greater than it actually is.

[0] https://en.wikipedia.org/wiki/Birthday_attack#Mathematics - Look at the table. My numbers were taken directly from the 256-bit row.

Collisions are when two different inputs hash to the same value, which breaks the security of the hash function.

If a hash function is cryptographically secure, it has the property where finding collisions is infeasable, which makes it suitable for evaluating equality.

This is why cryptographic hash functions are used as a proxy for passwords in order to avoid storing the plaintext.

"Collisions are when two different inputs hash to the same value, which breaks the security of the hash function" This is so wrong. Cryptographically secure hash functions merely claim that the probability of a collision is low. By your definition, it is impossible to find a secure hash function that works on an arbitrary input size.

"If a hash function is cryptographically secure, it has the property where finding collisions is infeasable, which makes it suitable for evaluating equality." This is so wrong.

It has a high probability of evaluating equality but in no way is it suitable. The primary benefit that cryptographic hash functions offer is that it is impractical to conduct a chosen plaintext attack.

"This is why cryptographic hash functions are used as a proxy for passwords in order to avoid storing the plaintext."

This is true, and is due to preimage resistance and the infeasibility of a chosen plaintext attack.