Hacker News new | ask | show | jobs
by kardos 3497 days ago
fdupes is not a problem assuming wikipedia's description [1] is correct: "It first compares file sizes, partial MD5 signatures, full MD5 signatures, and then performs a byte-by-byte comparison for verification."

I was unimpressed by the md5 used in the shell script at the original link, which is using a truncated md5...

[1] https://en.wikipedia.org/wiki/Fdupes

1 comments

Ok, fair enough. I would agree with the view that using md5, presumably for the faster performance, is probably not the best trade-off to be making here. Unless we're dealing with an NVMe drive (or something more exotic), you're likely to be IO bound even if using more computationally intensive hashing functions.

And if you are deduping on really fast storage, you'd get way better performance (with comparable safety) using something like xxHash64 (https://cyan4973.github.io/xxHash/).