Hacker News new | ask | show | jobs
by arjie 3399 days ago
The rsync example confuses me a little bit. If you add a single bit to the front, then all the bytes are shifted into different blocks and nearly none will hash to match. But if you add a single bit, rsync still performs well. Can someone explain why that difference from the explanation?

The problem also applies to the binary delta. Adding a prefix will shift everything forward causing a diff in everything.

Bsdiff solves this with the suffix sorting. But what does rsync do? Or am I just wrong that rsync still works well? In either case, I think the offset problem makes for a more interesting motivating example for bsdiff.

1 comments

The rsync algorithm divides the file into fixed size blocks only on the sending side, then calculates checksums for all blocks. On the receiving side, it tries to match them at all offsets, not just multiples of the block size.

Thus, in your example, the first (and possibly the last) block won't be found, but all other blocks will be found, shifted by an offset of 1.

Ah, seems obvious now that you say it. Thanks!