Hacker News new | ask | show | jobs
by mistercow 4706 days ago
>[1] Note that 128-bit hashes are at best 2^64 complexity to break; using a 128-bit hash is irresponsible based on sheer digest length.

Can a short hash which has not been weakened be lengthened by taking two hashes and concatenating?

    fixedSalt = "blah"
    longerHash = (salt, input) ->
        hash(salt + input) + hash(salt + fixedSalt + input)
Edit: Never mind. Obviously an attacker would only have to break the first half of the resulting hash.

But is there any valid way to lengthen a too-short hash? Not that it's of practical importance; I'm just curious academically.

2 comments

This is actually a pretty interesting question. The answer, at least for Merkle-Damgard hash functions (MD5, SHA-1, SHA-2, etc) is that concatenating (or "cascading") hash functions doesn't really improve the strength of the resulting construction.

Merkle-Damgard hash functions look like this:

  function MD(M, H, C):
    for M[i] in pad(M):
      H := C(M[i], H)
    return H
For message M, initial state H, and compression function C. In other words, pad the message, break it into blocks, and use the compression function to "mix" each block into the hash function's state. The final state is the result.

Antoine Joux showed that for any MD hash function, generating many collisions is not much more difficult than generating one. Here's how:

  1. Take the initial state H[0].
  2. Find two single-block messages that collide under C with H[0] as the input state. Call the result H[1].
  3. Now find another pair of single-block messages that collide under C with H[1] as the input state. Call the result H[2].
  4. Iterate as needed.
Here's the trick: each single-block collision you find is actually doubling your set of colliding messages. This is because for each block of the message, you can select either of two candidate blocks. So if the effort required to find a collision in a b-bit hash function is 2^(b/2), the effort to find 2^n such collisions is only n * 2^(b/2).

What does this mean for cascaded hash functions? Well, consider a hypothetical construction that simply concatenates a 128-bit hash function with a 160-bit function. A plan of attack might look like this:

  1. Find 2^80 colliding messages under the 128-bit function. This should take roughly 80 * 2^64 ~= 2^70.3 units of "effort".
  2. Evaluate each message under the 160-bit function. There's probably a collision in there somewhere. This will take around 2^80 work, dwarfing what we did in the first step.
The effort to find a collision under both functions is thus only about what it takes to find a collision under the stronger of the two.

Shameless plug: we will have a few problems exploring the consequences of Joux collisions in set seven of the Matasano crypto challenges.

Ah, so in fact the naive concatenating solution I gave, in addition to being just as easy because the attacker only has to break half of it, is actually even easier because the attacker has two targets to collide with.

What about non-concatenative methods? For example, could you do something like "shift and xor". For example, say you have a 4-byte hash function, so that:

   hash(salt + input)             :  0x3AF9
   hash(salt + fixedSalt + input) :  0x8034
then you shift one by a byte and xor like so

       3AF90
   xor 08034
     = 32FA4
And then you could iterate that to extend to as many bytes as you want? And then maybe you'd want to xor the first and last byte together so that nothing from the first hash remains - although now I'm thinking intuitively which generally seems to be a bad idea with crytpography.
> Ah, so in fact the naive concatenating solution I gave, in addition to being just as easy because the attacker only has to break half of it, is actually even easier because the attacker has two targets to collide with.

I wouldn't say it's easier. Remember that we need to find a single message that generates a collision under both hash functions. So our strategy is to generate a massive number of collisions for the shorter function and hope that there's one pair of messages in there that collide under the longer function.

> What about non-concatenative methods?

I think this will again boil down to finding a single message that generates a collision under both hash functions. It won't matter too much whether you XOR the hashes or concatenate them.

Hmm. It still seems to me that the simple concatenative method could still be broken at least slightly more quickly, since each function can be collision tested separately, but my brain is vaguely gesturing at comprehension about why xoring is still weak.

I feel like we should be teaching the principles of crypto to young children so that we end up with some humans that can grok it as easily as the rest of us do algebra. But there are a great many things I would want young children to be taught if I were made the Benevolent Dictator of All School Boards.

As an aside, you can use PBKDF2 with an iteration count of 1 to extend the output of a hash function to arbitrary lengths:

    output = HMAC-SHA256("passphrase", "salt" + x"00000001")
           + HMAC-SHA256("passphrase", "salt" + x"00000002")
          ...
This is useful when you want to derive keys from a (strong) master secret, but if you don't trust the underlying algorithm, all bets are off.
Hmm. That seems like it's still susceptible to the weakness sdevlin mentioned. For c = 1, it is the same thing that I suggested initially, and for larger c, it merely makes the hash more expensive to compute by iterating it, but the concatenation weakness would still hold, wouldn't it? It seems to me that this is perhaps useful for generating keys from passphrases, but not for lengthening a hash used to store a password in a database.
Yes, I think so. IIUC, with this construction, the difficulty only increases linearly with the length of the output, rather than exponentially, as one might expect.

I don't think there's a generically secure way to extend short hash functions to get an exponential difficulty increase. Otherwise, we could just construct arbitrary-length hash functions using small (e.g. 32-bit) building blocks without needing to cryptanalyze the result.

But then again, I haven't been paying attention to the literature lately.