Hacker News new | ask | show | jobs
by qwename 2192 days ago
Other than using a stronger hashing algorithm that produces longer hashes, would there be any advantage in storing two or more separate hashes of an object? The extra hashes could be from a different hash function, or a hash of the reversed bits/bytes.

I wonder about the difficulty of producing collisions for a single 256-bit hash function versus two 128-bit hash functions, four 64-bit hash functions and so on.

3 comments

It looks like you've already got the jist of this from the second and third link in a grandchild post. Anyway!

For a bit of a hands-on view, there is a terrific cryptography coding exercise that explores this idea[0], and you take it all the way in actually breaking the construction (although the exercise is concerned with collisions, rather than preimages). The linked exercise has you homebake your own de-fanged hash functions, so that you can run solution code within your lifetime (as well as giving you a closer view of the machinery).

Of course, the exercise applies to fairly raw, Merkle-Damgard-like constructions.

What I took away from the challenge was that I would feel better using a single, longer hash than two shorter, concatenated hashes, because you are bottlenecking your construction to your stronger hash (if they differ in strength).

An auditor or attacker with white box access can attack your smaller hash (if your two hash functions differ in bit-size) to generate a 2^(n/2)-way collision (that is, if it's feasible to collide the hash at all, there is a clever trick to generate a massive number of messages that all collide to the same output), then birthday attack the bigger hash using the messages from that collision-pool. (Importantly, this mention of "n" specifically refers to the bit-size of the larger hash, a.k.a., the bottleneck.)

Although there might be something about it at first glance that makes the messages feel not random enough (in order for the birthday principle to hold), it's far easier to tell you to just do the exercise in order to grok why it works than for me to figure out how to explain it :)

This exercise is the first of an awesome triple-whammy[1][2] of enlightening hash function challenges.

[0] https://cryptopals.com/sets/7/challenges/52 [1] https://cryptopals.com/sets/7/challenges/53 [2] https://cryptopals.com/sets/7/challenges/54

That reminds me of my phone. It is kinda special. Instead of an eleven digit number, I have eleven one digit phone numbers. When I write them down, I usually just writen them all in one row. So the written version of my eleven phone numbers look just like a normal eleven digit phone number.
I will try to show the difference between one 256-bit hash function and two 128-bit hash functions.

Let h_0(x) be a 256-bit hash function. Let h_1(x) and h_2(x) be 128-bit hash functions.

Let h_3(x) = h_1(x) + h_2(x), h_4(x) = h_2(x) + h_1(x), where '+' is the concatenation operator. Thus h_3(x) and h_4(x) are also 256-bit hash functions.

Given an object x_1, find x_2 that satisfies one of the following conditions:

(1) h_0(x_1) = h_0(x_2)

(2) h_3(x_1) = h_3(x_2) AND h_4(x_1) = h_4(x_2)

Finding a collision for a single 256-bit hash function would be trying to solve (1), but finding collisions for two 128-bit hash functions would be trying to solve (2).

Also note that (2) is not equivalent to finding collisions for two 256-bit hash functions.

I think the misconception comes from the fact that the multiple hashes form an unordered set, not an ordered concatenation.

If there's any mistake in my logic please feel free to point them out.

Edit: (2) can be simplified into h_1(x_1) = h_1(x_2) AND h_2(x_1) = h_2(x_2), as concatenation of the hashes does not turn h_3 and h_4 into "real" 256-bit hash functions.

This makes sense. However the implementation detail of h_0(x) could also be foo(x) + bar(x) or whatever. At the end, you have a function that returns a fixed number of bytes.

I doubt any has function that we use today changes the algorithm halfway through its operation though. So your proposal might have a benefit in security by obscurity way.

I suppose I missed the assumption that h_0, h_1 and h_2 cannot be trivially decomposed into concatenations of other hash functions, whereas h_3 and h_4 can.

Also, there's no obscurity since h_1 and h_2 are the actual targeted hash functions, h_3 and h_4 are just to show that concatenation doesn't change the actual problem or make it harder.

What?
A function that concatenates the results of two 64-bit hash functions _is_ a 128-bit hash function.

Just as a string of individual digits makes a larger number.

I mean, technically yes, but it it would be a 128-bit hash function with the security properties of a 64-bit hash function, so it offers little advantage over just using a 64bit hash (which I think was also the point you were trying to make, I think?).

However, that doesn't really address the original question on how much harder cracking 2x64bit hash would be than cracking a single 128bit hash would be.

My best guess there would be that it's really quantify as you start opening up more dimensions besides the number of bits. The gain would mostly come from protection against other properties from one of the algorithms like a potentially hidden backdoor or a undiscovered mathematical weakness. So as long as the strength of the individual hash function holds up, it probably makes sense to diversify between hash functions. E.g. SHA3-256 + BLAKE3-256, probably offers better long-term security properties than using SHA3-512.

> but it it would be a 128-bit hash function with the security properties of a 64-bit hash function

This is not true. Consider two hash functions f and g

    f(x) = md5(x)[0..63]
    g(x) = md5(x)[64..127]
and a third function

   h(x) = f(x) || g(x)
where || is concat

So no, concating multiple smaller hash functions is not any weaker than using a single big one.

So your point is that if you take the output of the _same_ 128bit hash function twice, split it into 64bit parts and then put it back together, you still have the full properties of the 128bit hash function? Well, no shit.

I have to admit that I'm not the greatest cryptography whiz, but I can't image that this holds up for _independent_ hash functions, where you should be able to more cheaply run a preimage attack against two 64bit hash functions than one 128bit hash function.

My guess: To paths to the same number string are the same number. ie: two 64 bit numbers is a 128 bit number in disguise.
Hash security is a complicated beast. There were some research results that concatenating multiple hash functions isn't much more secure than the best of the concatenated functions. It's not a good way to produce more secure hashes.

Also please note that length is only one measurement of hash security. The fact that SHA1 is weak has more reasons. If it was a good hash function it would still have a security of 80 bits, which would not be a comfortable security margin, but still kinda not really broken for real. But further weaknesses reduced the attack complexity.

Likewise if you think about SHA256 this is not just the 256-bit-instead-of-160-bit-version of SHA1. It's a different function (although based on simliar constructions) without the weaknesses of SHA1.

After finding the right search terms (concatenating hash functions), I found a few stackexchange discussions about this, which lead me to other methods like truncating a stronger hash function[1], and chaining hash functions[2]. Apparently TLS already concatenates MD5 and SHA1 [2][3].

Given that the article is about collision attacks and not preimage resistance, that was my main thought when thinking of the issue. I'll leave it to the experts to figure out what's the best for cryptographic hash functions.

[1] https://crypto.stackexchange.com/questions/9435/is-truncatin...

[2] https://crypto.stackexchange.com/questions/270/guarding-agai...

[3] https://en.wikipedia.org/wiki/Cryptographic_hash_function#Co...

> Apparently TLS already concatenates MD5 and SHA1 [2][3].

Luckily TLS no longer does such things. This was a bad workaround in old versions of TLS. Which they then replaced by a "you can use secure or insecure hash functions, you choose" in TLS 1.2 (which is hard to excuse - at the time TLS 1.2 was written the weaknesses in SHA1 and MD5 were well known). In TLS 1.3 finally they did the right thing and only support secure hash functions.