|
|
|
|
|
by _swa8
3707 days ago
|
|
I wrote mine along similar lines, except without using hashing at all. Files of identical size are compared byte-by-byte instead, until first difference or end of file. As many files as possible at a time, of course, to avoid having to read through files multiple times. This avoids any uncertainty about hash collisions. To find out how many files of each size you have: find ~ -type f -printf '%s\n' | sort | uniq -c | sort -n |
|