Hacker News new | ask | show | jobs
by saltcured 1175 days ago
Backup tools don't necessarily model mutation, since their purpose is to encode immutable snapshots of data. The modern ones that use deduped content storage also generally like to make snapshots independent of one another, so you can prune snapshots without worrying about invalidating another dependent snapshot. As a result, introducing a new version of a file likely has the same O(n) cost as backing up a new file, as long as they both contain content chunks already found in the storage area.

These backup tools also don't generally have to optimize random access to parts of a file. They may just store a linear sequence of chunk IDs to represent one file version. To bring this back to an active system that supports random access by a program with patching, I think you'd really need to adapt a copy-on-write filesystem to use content-defined chunking instead of fixed offset chunks. Then, your insertion is likely an operation on some tree structure that represents the list of chunk IDs as the leaves of the tree. But, this tree would now have to encode more byte offset info in the interior tree nodes, since it would vary at the leaves instead of each leaf representing a fixed size chunk.