|
|
|
|
|
by riceart
1180 days ago
|
|
But I think they’re talking about if you’re doing content defined chunking of a pack or archive file - inserting data should only affect insertion chunks + 1. And since it’s content defined - those chunks are necessarily not constant size. Ie this is how backup tools like Arq or duplicacy work. |
|
These backup tools also don't generally have to optimize random access to parts of a file. They may just store a linear sequence of chunk IDs to represent one file version. To bring this back to an active system that supports random access by a program with patching, I think you'd really need to adapt a copy-on-write filesystem to use content-defined chunking instead of fixed offset chunks. Then, your insertion is likely an operation on some tree structure that represents the list of chunk IDs as the leaves of the tree. But, this tree would now have to encode more byte offset info in the interior tree nodes, since it would vary at the leaves instead of each leaf representing a fixed size chunk.