Hacker News new | ask | show | jobs
by nikisweeting 602 days ago
How would you want the duplicates resolved? Just reported in some interface or would you want the duplicates deleted off some machines automatically?

There are a few different ways you could solve it but it depends on what final outcome you need.

1 comments

Just reporting in some plain text format so I can manually delete the duplicates, or create some script to delete.

I can't have like 10 external HDDs attached at the same time, so the tool needs to dump details (hashes?) somewhere on Mac HDD, and compare against those to find the duplicates.

Here you go:

    cd /path/to/drive
    find . -type f -exec sha256sum {} + | sed -E 's/^([^ ]+) \./\1,/' >> ~/all_hashes.txt
Run that for each drive, then when you're done run:

    sort ~/all_hashes.txt > ~/sorted_hashes.txt
    awk -F, 'NR==1{print;next} {print $0 | "sort | uniq -w64 -D"}' ~/sorted_hashes.txt > ~/non_unique_hashes.txt
The output in ~/non_unique_hashes.txt will contain only the non-unique hashes that appear on more than one path.