|
|
|
|
|
by geoff-codes
3946 days ago
|
|
Identical is pretty easy. Maybe something like this. #!/bin/sh
here=$PWD
dir=.
depth=
rm -f /tmp/list
[ $# -eq 1 ] &&
[ -e "$1" ] && dir=$1 || depth=$1
[ $# -eq 2 ] && for i in "$@"; do
[ -e "$i" ] && dir=$i || depth=$i
done
for each in $(find -L $dir -type d -depth $depth 2>/dev/null); do
[ -x "$each" ] && cd "$each" &&
sha=$(tar c . 2>/dev/null | tar xmvO 2>&1 | xz | shasum | sed 's| .*||')
[ $(find . 2>/dev/null | wc -l) -gt 1 ] && echo $sha $each >> /tmp/list
cd "$here"
done
for sha in $(cat /tmp/list | sed 's| .*||' | sort | uniq); do
[ $(grep $sha /tmp/list | wc -l) -gt 1 ] &&
echo Identical directories: && grep $sha /tmp/list | sed 's|.* ||' && echo
done
Similar and "similar-and-strictly-newer" both are much trickier as you have to invent a rubric for what "similar" mean, and `diff -qr` isn't going to tell you if, say, the files are mostly the same, but have been moved into a subdirector. So I'd probably use git, traversing the file tree by moving the .git dir around and adding each candidate directory as a different branch, and doing a `git gc` each time to try to keep the size of the index manageable. Then doing a `git diff [--word-diff] [--stat] --find-copies-harder` between branches will pick up files that have been moved around, etc. You could literally do this for every directory and subdirectory, but if you can narrow it down to, say, directories with the same baseman, it would be substantially easier.On the other hand, I would say I suffer from this same ailment, I just mostly don't bother sorting it out. I just use something like http://cpansearch.perl.org/src/ANDK/Perl-Repository-APC-2.00... to make hard links between identical files, keeping the size of the monstrosity in check. |
|