|
|
|
|
|
by octoberfranklin
2111 days ago
|
|
Rolling hashing is really only useful for finding nonaligned duplicates. There isn't a way to advertise some "rolling hash value" in a way that allows other people with a differently-aligned copy to notice that you and them have some duplicated byte ranges. Rolling hashes only work when one person (or two people engaged in a conversation, like rsync) already has both copies. |
|
The rolling hash is used to find the chunk boundary: Hash a window before every byte (which is cheap with a rolling hash) and compare it against a defined bit mask. For example: Check if the first 20 bytes are zero. If so, you'd get chunks with about 2^20 bytes (1 MiB) average length.
As a good explanation, I'd encourage you to look at borgbackup's internals documentation: https://borgbackup.readthedocs.io/en/stable/internals.html