|
|
|
|
|
by hinkley
2112 days ago
|
|
I think they understood just fine. If I discover that the file I want to publish shares a range with an existing file, that does very little because the existing file has already chosen its chunk boundaries and I can’t influence those. That ship has sailed. I can only benefit if the a priori chunks are small enough that some subset of the identified match is still addressable. And then I may only get half of a two thirds of the improvement I was after. |
|
If they both used the same rolling hash function on the same or similar data, regardless of the initial and final boundary and regardless of when they chose the boundaries, they will share many chunks with high probability. That’s just how splitting with rolling hashes work. They produce variable-length chunks.