Hacker News new | ask | show | jobs
by jonhohle 478 days ago
Yeah, there is definitely some merit to more efficient hashing. Trees with a lot of duplicates require a lot of hashing, but hashing the entire file would be required regardless of whether partial hashes or done or not.

I have one data set where `dedup` was 40% faster than `dupe-krill` and another where `dupe-drill` was 45% faster than `dedup`.

`dupe-krill` uses blake3, which last I checked, was not hardware accelerated on M series processors. What's interesting is that because of hardware acceleration, `dedup` is mostly CPU-idle, waiting on the hash calculation, while `dupe-krill` is maxing out 3 cores.

Thanks for the link!