Hacker News new | ask | show | jobs
by pas 2291 days ago
Oh, I used rsync many times but I thought it simply retransmits changed files. (Oh, it needs the --checksum argument to do this, okay.)

So how do these self-synchronizing hashes work? Like a Merkle Tree? (Ah, okay https://en.wikipedia.org/wiki/Rsync#Determining_which_parts_... )

So rsync uses 8KB for chunk size, so for a file 1GB it has 125 000 chunks. (And if every chunk needs 16 bytes of hash data to send, that's about 2MB, pretty darn efficient, especially if it can spot reorders.) Though according to Wikipedia it only does this if the target file has the same size, so adding new files to ISOs might not work in case of rsync, but still, the possibility is there for diff algos and version control systems.

1 comments

No, target doesn’t have to be same size. As an optimization, if size and datetime are the same, rsync will assume no change and will not hash at all (though you can force it to).

But it will definitely use hashes when size differs (unless forced to copy whole files, or copying between local file systems)