|
|
|
|
|
by MBCook
480 days ago
|
|
You can start with the size, which is probably really unique. That would likely cut down the search space fast. At that point maybe it’s better to just compare byte by byte? You’ll have to read the whole file to generate the hash and if you just compare the bytes there is no chance of hash collision no matter how small. Plus if you find a difference in bytes 1290 you can just stop there instead of reading the whole thing to finish the hash. I don’t think John has said exactly how on ATP (his podcast with Marco and Casey), but knowing him as a longtime listener/reader he’s being very careful. And I think he’s said that on the podcast too. |
|
This is the default for ZFS deduplication and git does something similar with size and far weaker SHA-1. I would add a test for SHA-256 collisions, but no one has seemed to find a working example yet.
0 - https://github.com/ttkb-oss/dedup