Hacker News new | ask | show | jobs
by burmecia 2573 days ago
Another approach I can think of is using rolling hash (https://en.wikipedia.org/wiki/Rolling_hash) to split file to different size of chunks, those chunks can be saved on different sectors which don’t need to be continuous. The file keeps a list of indexes for all those chunks, just like inode did for blocks. When insert some bytes in the beginning of file, thanks for the rolling hash, most likely only the first several chunks need to be re-chunk and re-hash, thus the change is localised and can be cheaply done, all the rest chunks still untouched. When this combined with append-only storage, things are even easier because it only needs to deal with bytes around that beginning, and the update chunk index list in file.

This approach is already implemented in ZboxFS (https://github.com/zboxfs/zbox) to do content-based deduplication.