Hacker News new | ask | show | jobs
by ivmaykov 2664 days ago
Disclaimer: I’m one of the authors of the paper/blog post/code.

If you want to use signatures over the hash as proof of data set integrity, you need two things. 1) you need to make sure that hash({a}) + hash({b}) == hash({a, b}). 2) ensure that hash() is collision resistant - in other words, it needs to be computationally infeasible to find hash(S) == hash(T), S != T for any sets S and T. We prove that LtHash with our choice of parameters has this property in the paper (which is linked from the blog post).

3 comments

My reading of the post is that the Hash({a, b}) is infact computed as Hash'(a) + Hash'(b) given that a and b are "rows". And thus my question is why Hash' has to have any special properties.
I can think of hash functions that are homomorphic, but are not secure. A simple example is something like “sha256 each element separately and XOR all the resulting hashes together.” This would not be collusion resistant.

We offer a proof that LtHash with our choice of parameters provides over 200 bits of security. You would have to read the paper for the details.

I can see how you might lose collision resistance with a normal 256-bit hash function, but it's not clear why a "stretched" hash with 2048 bytes of output wouldn't work.

E: Ah, there it is, in the paper:

> However, Wagner [Wag02] later showed an attack on the generalized birthday problem which could be used to find collisions > for AdHash on an n-bit modulus in time O(2^(2√n)), and that the AdHash modulus needs to be greater > than 1600 bits long to provide 80-bit security. Lyubashevsky [Lyu05] and Shallue [Sha08] showed > how to solve the Random Modular Subset Sum problem (essentially equivalent to finding collisions > in AdHash) in time O(2^(n^ε)) for any ε < 1, which indicates that AdHash requires several more orders > of magnitude larger of a modulus just to provide 80-bit security.

Assuming a trusted system, could you go into more detail on the qualities or guarantees a Merkle tree using XOR would or would not give you and partial mitigations for any of these disadvantages? i.e. How does the entropy of a leaf node's 256-bit signature disperse into the tree up to the root? How does the birthday paradox come into play with regards to collisions at the root of the tree?

For synchronization purposes in a trusted system, a Merkle tree based on XOR seems elegant and efficient, but I can't seem to find accessible papers on this.

Just so I have this clear: is the issue the simple solution above doesn't work (at all) with multisets, and the birthday problem combined with potentially billions of rows means even sha256 will have a significant chance generating a collision?
Are you guys going to use this in your blockchain?