Hacker News new | ask | show | jobs
by Everlag 3025 days ago
This is a well known edge case of Merkle trees; my undergrad security course covered this as an aside and it is prominent on the wikipedia page.[0]

For those not familiar with the specific implementation issue, this post should be a fun read.

I don't know what's more concerning, that pymerkletools advertises itself for use with Bitcoin or that none of the contributors read the wiki page :|

[0] https://en.wikipedia.org/wiki/Merkle_tree#Second_preimage_at...

2 comments

Because Bitcoin's Merkle tree is vulnerable to it as well :)

The fix came through a higher order property (duplicate TXs not allowed anyways).

https://bitcointalk.org/index.php?topic=102395.0

It's a bit strange to consider this even an edge case, and weird that one would even create a diagram (such as the one on the wikipedia page) that doesn't have a solution (such as a leaf node marker prepended). Why isn't the prepended leaf node marker included in the concept of a Merkle tree? Is there ever a situation where you'd want a Merkle tree that allowed this?
The original use case for Merkle trees were to compactly represent sets of one-time-use public keys. In that context, an attacker trying to exploit this sort of collision would end up presenting an interior node as a public key leaf node, but the size of an interior node is too small to be a valid public key, so no party that performs sanity checks on the public keys would be fooled by the attacker. In that case, tagging nodes as leaves or internal nodes is arguably a tiny tiny bit wasteful. That's the only argument I can think of, and it's a poor one.

Edit: Okay, the other valid argument for not using prefixes to tag leaves and interior nodes is that you're using the provably secure Sakura construction, which using suffixes rather than prefixes. There are a few advantages to using suffixes, such as being able to have a single node containing both data and child nodes without having to know the length of the data portion before starting to hash the node. There's also better performance due to memory alignment when hashing memory-mapped files (if using block sizes that pack well into memory pages and cache lines) if you use suffixes. But, suffixes vs. prefixes is a tiny nit to pick.

Okay, I guess in mathematical terms a better way to express the whole thing would be in terms of two hash functions:

h_leaf(leaf) which takes a leaf.

h_branch(branch_left, branch_right) which takes the two branches.

The important point being that one should not be able to find a leaf such that h_leaf(leaf) = h_branch(branch_left, branch_right) for any branch_left, branch_right.

Prefixes versus suffixes are just implementation details of the hash functions (i.e. h_leaf(x) = "\00" + sha256(x), h_branch(x,y) = "\01" + sha256(x + y) would also work).