Hacker News new | ask | show | jobs
by bmalehorn 3623 days ago
Block, it deduplicates based on 4MB blocks.

Source: https://news.ycombinator.com/item?id=2478595

1 comments

I wonder if they use a rolling checksum too, to avoid duplicating a complete file if only a view bytes shifted (for example adding a line of text in the beginning of a file)

The backup tool bup (https://github.com/bup/bup) does this.

They almost certainly do not, mostly because of how slow doing so is.
It probably wouldn't hit the most important cases either, dedup is typically most powerful & valuable on large media files, software packages, disk ISO's, and the like which do not frequently have arbitrary text inserted at the start of the file!