|
|
|
|
|
by btschaegg
2111 days ago
|
|
I think you misunderstood how the rolling hash is used in this context. It's not used to address a chunk; you'd use a plain old cryptographic hash function for that. The rolling hash is used to find the chunk boundary: Hash a window before every byte (which is cheap with a rolling hash) and compare it against a defined bit mask. For example: Check if the first 20 bytes are zero. If so, you'd get chunks with about 2^20 bytes (1 MiB) average length. As a good explanation, I'd encourage you to look at borgbackup's internals documentation: https://borgbackup.readthedocs.io/en/stable/internals.html |
|
If I discover that the file I want to publish shares a range with an existing file, that does very little because the existing file has already chosen its chunk boundaries and I can’t influence those. That ship has sailed.
I can only benefit if the a priori chunks are small enough that some subset of the identified match is still addressable. And then I may only get half of a two thirds of the improvement I was after.