Hacker News new | ask | show | jobs
by 369548684892826 814 days ago
For static files like photos I hash check drives against each other to check for bit rot. But yeah, for tape based cloud storage I can't think what else to do except restore one file to check I still have access. It's too expensive to restore the whole archive.
1 comments

Do you have any automation or scripts for that, or is it usually adhoc?
Yeah, I use rsync with `-cavin`, if the output includes `>` or `<` then there are differences in checksums.

  #!/bin/bash
  hostname=$(hostname)
  checksumFile=/tmp/checksum-file.txt
  
  rsync -cavin --info=name2 --no-perms --no-owner --no-group /srv/data/photos/ user@remote:/media/data/rsnapshot/daily.0/srv/data/photos/ > $checksumFile
  
  minimumSize=5000
  actualSize=$(wc -l <"$checksumFile")
  
  if [ $actualSize -ge $minimumSize ]; then
      fileDiffs=$(cat $checksumFile | grep -e '<' -e '>' || echo 'all checksums match')
  else
      fileDiffs=$(echo $checksumFile is too small)
  fi
  
  echo $fileDiffs
You absolutely should NOT automate this. If you do, you then need to manually check that automation with the same frequency- so you haven’t gained anything!
I do have this automated, but the checksum output is posted to a kind of watchdog service. Every day I get an email that says everything is as expected, or not.
I need to google every time to find a good process for hashing files and comparing across disks? Not only that, but remember to do it frequently enough?