Hacker News new | ask | show | jobs
by smussmann 4415 days ago
> There are lots of properties by which one can evaluate a hash function, but the most important are preimage resistance (if I give you a hash, can you find an input with that hash value?), second preimage resistance (if I give you a message, can you find another that hashes to the same value?) and collision resistance (can you find two messages with the same hash?). Of those, the third appears to be much harder to meet than the first two, based on historical results against hash functions.

To my (crypto noob) eyes, it seems like second preimage resistance is just a specific case of collision resistance. In both, we're trying to find two messages that have the same hash value, but with collision resistance, we can choose the first one arbitrarily. The last sentence, saying that collision resistance is a higher barrier than second preimage resistance, thus doesn't seem to make much sense to me. What am I missing?

6 comments

Consider the three attacks that you've described here (preimage, second preimage, and collision) as differently constrained cases of:

    H(m) = x = H(m')
Where m is the original message, x is its hash, and m' is the second message that you need to find.

For a preimage, you're given a value of x, and you don't even know what m is.

For a second preimage, you're given a value of m to start from. (And, thus, you know x.)

For a collision, you get to pick any m (and x) that you want to make your collision work. This last case is easiest to work with, since you can set up the messages in whatever way you need to exploit even a very subtle vulnerability. In a second preimage attack, the original message is given to you, so it may not set things up in an easily attackable way.

In a sense, a second preimage is a specific case of a collision. The critical difference is that a second preimage is a targeted collision.

Collision resistance is harder to meet in the same way that "stronger" results are harder to prove. That is, I think you're correct and that my wording was just confusing, sorry!

Historically, many hash functions have been broken by collisions, but even very "weak" hash functions (i.e. MD5) still have full design strength for preimage and second-preimage resistance (as far as I know).

Out of curiosity is there a reason why cert-test.sandbox.g.c resolves to so many IPs compared to www.g.c?

  dfc@ronin:~$ host cert-test.sandbox.google.com |grep address |wc
       16      64     894
  dfc@ronin:~$ host www.google.com |grep address |wc
        6      25     262
I wanted to see how the various vendor ssl tests would handle sha256. Everything i tested had no problem, but they all took a little longer than usual because of the number of DNS results.

PS Not that it would matter but unlike other google hosts the cert-test does not have any IPv6 records.

I don't think the load balancing is setup as tightly as www.google.com. It doesn't get much traffic after all.
Yeah, I think I was just confused. An off-by-one error in applying negatives, I think.

Thank you for the article. I found it interesting to think about.

Collision resistance implies 2nd-preimage resistance, and is therefore the stronger notion.

Suppose your function is collision resistant, but not 2nd-preimage resistant. Then this means we cannot find an arbitrary pair (x, y) such that H(x) = H(y). But given a fixed x, H(x) we can find y such that H(y) = H(x). This is of course absurd, since we can easily turn the latter 2nd-preimage attack into a collision attack by fixing some arbitrary x ahead of time.

It is also the case in practice that collisions are easier to find. This is owed to the fact that a collision attack has much more degrees of freedom given to the attacker than a (second) preimage attack.

The sentence is saying that collision resistance seems to be practically harder to achieve than second order preimage resistance.

One reason is a lot of hash functions are iterative in nature (you have a compression function repeatedly applied to a state, mixing the message into the state in small pieces). For collision resistance, finding a collision in just the compression function (or a limited number of applications of it) is enough to find a collision in the whole hash function (you can 'work outwards' from the collision). You can't use that effect in second order preimage resistance.

IANAC.

One way to look at it: collision resistance prevents more attacks than mere 2nd preimage resistance, so that is a "higher" barrier. As you say, a certain type of attack would be to arbitrarily pick the first message and then try to find a collision, but that isn't the only conceivable attack. You might be able to run an attack that would generate message pairs that would have a higher likelihood of defeating a particular hash, if the hash had 2nd preimage but not collision resistance.

EDIT: I agree with sibling comments.

For those who have this question and somewhat less cryptography experience, consider the birthday problem: it's much harder to find someone in a crowded room that shares your birthday than it is to find two people who share a birthday.