|
|
|
|
|
by jonhohle
482 days ago
|
|
To make dedup[0] fast, I use a tree with device id, size, first byte, last byte, and finally SHA-256. Each of those is only used if there is a collision to avoid as many reads as possible. dedup doesn’t do a full file compare, because if you’ve found a file with the same size, first and last bytes, and SHA-256 you’ve also probably won the lottery several times over and can afford data recovery. This is the default for ZFS deduplication and git does something similar with size and far weaker SHA-1. I would add a test for SHA-256 collisions, but no one has seemed to find a working example yet. 0 - https://github.com/ttkb-oss/dedup |
|