Hacker News new | ask | show | jobs
by droffel 1115 days ago
Merkle trees! You (simplified) hash the data at the leaves of the tree, merge adjacent leaves (concatenation for example), and hash the new leaves, until you're left with just the Merkle Root, a single hash representing the entirety of your data. Verifying the root is easier than loading the data itself into memory and hashing it there since you can verify it piecemeal and without loading it all into memory at once.
1 comments

But regardless of the ordering which you read, don't you ultimately still need to read every bit once?
Each level of the tree contains a hash of the hashes from the level below, so to verify the top hash you need only hash all its children (which are hashes themselves). Then you can explore a child, continuing recursively. At the bottom you find a hash of an actual file (or part of a file), then hash the file to check it’s validity.

This allows you to only hash those parts of the tree you actually want to read.

If you want to verify the entire image, I don’t think you can get around reading the entire image. Because any part you didn’t read to verify the hash is a part that could contain corruption.
> Big Sur, Monterey or Ventura in default Full Security mode has verified every last bit of their 9 GB SSV.

In this case they're not selective, so the tree approach doesn't save the reads. (Just makes it easier to identify which part failed)

That's around 3s, for their slowest offering.