Hacker News new | ask | show | jobs
by chanandler_bong 1044 days ago
The fact that as a customer that is impacted by this, I only found out about it when my backups and automated test restore failed is worrying.

I get that things happen, but not realising that this was going to be a service impacting event does not inspire continued confidence. Sure, this situation probably won't happen again, but what else don't they understand about their infrastructure?

3 comments

It looks to me that their US10 is not like an AZ but an actual server with bunch of HBA and disks. So very much pets and not only in a single location but possibly in a single rack or even box.

You are (maybe) protected against a few disk failures but that's about it.

This FAQ entry seems to confirm this: https://docs.borgbase.com/faq/#which-storage-backend-are-you...

I got an email about it right away and I’ve also been getting warnings (that I configured in the dashboard) for inactivity on a repo that’s affected.
(off topic) I promised to reply to you in another thread, but I can't because replies are locked. feel free to reach out to me if you want - my contact is in my profile
> automated test restore failed

You automatically test restore? That makes sense but I've never heard of that before, can you describe the process?

Not OP, but I would guess it's something like this:

  1. Make a e.g. 30MB file of random data  
  2. Copy it to "_reference" file  
  3. Upload the file to backup service  
  4. Restore the file from backup service  
  5. Diff restored file against reference
Pretty simple, really.

Pick a couple random files that should be in the repo, restore them from a random archive, check the md5sums against the source. If the md5sums don't match (or the file can't be found), something is wrong. I am mainly backing up RAW image files, so they should never change.

Basically...

$TEST_FILE=$(ls -p /source_dir | grep -v / | shuf -n1)

$TEST_ARCHIVE=$(borgmatic -c config.file list | shuf -n1)

borgmatic extract yada yada yada

md5sum $TEST_FILE restored_file

I don't use borg, but I used duplicity, which offers something like that. The verify operation simulates a backup restore to compare whether the restored file's checksum matches that expected from the metadata and optionally against the local file. I use this routinely, interesting to see that a local S3 provider can sometimes mess up your files silently.