|
|
|
|
|
by beagle3
2291 days ago
|
|
Sometimes they do - e.g. if you replace a file in the ISO that is the same size up to block alignment, which is common when e.g. editing a text file or recompiling an executable with a minor change. They almost always do when it's a VM image representing a disk - only some blocks change every write. However, with self synchronizing hashes of the kind used by rsync bup and borg, it doesn't matter - you could have a 1TB file, delete a single byte at position 100 - and you only need to store or transfer one new block (with average size 8KB for rsync, configurable for borg) if you already have a copy of the version before the change. It's somewhat comparable with diff/patch but not exactly; it's worse in that change granularity is only specified on average; It's better in that it works well on binary files, does not require a specific reference diff (can reference all previous history), and efficiently supports reordering as well small changes - if you divide a 4000 line text file to four 1000-line sections and reorder them 1,2,3,4 -> 3,1,4,2 you will find the diff/patch to be as long as a new copy, whereas a self synchronizing hash decomposition will hardly take any space for the reordered file given the original. |
|
So how do these self-synchronizing hashes work? Like a Merkle Tree? (Ah, okay https://en.wikipedia.org/wiki/Rsync#Determining_which_parts_... )
So rsync uses 8KB for chunk size, so for a file 1GB it has 125 000 chunks. (And if every chunk needs 16 bytes of hash data to send, that's about 2MB, pretty darn efficient, especially if it can spot reorders.) Though according to Wikipedia it only does this if the target file has the same size, so adding new files to ISOs might not work in case of rsync, but still, the possibility is there for diff algos and version control systems.