|
|
|
|
|
by aurelian15
2108 days ago
|
|
There are ways around this. See "content-aware chunking", e.g. implemented using rolling hashes [1]. This is for example what rsync does. The idea is to make blocks (slightly) variable in size. Block boundaries are determined based on a limited window of preceding bytes. This way a change in one location will only have a limited impact on the following blocks. [1] https://en.wikipedia.org/wiki/Rolling_hash |
|
There isn't a way to advertise some "rolling hash value" in a way that allows other people with a differently-aligned copy to notice that you and them have some duplicated byte ranges.
Rolling hashes only work when one person (or two people engaged in a conversation, like rsync) already has both copies.