Hacker News new | ask | show | jobs
by tetha 912 days ago
It is also a good idea to test the restore procedures and documentation as well.

Don't have the grizzled old storage admin / DBA test the backup. They know a million and one weird necessary workarounds and just execute them. However, if you need a restore and they are currently exploring caves or something, things turn dire. Have a chipper junior restore something based off of the documentation (and prepare to spend a few days updating documentation...)

And make sure to test backup you don't regularly touch. And very much test those backups you really don't want to test.

2 comments

As a grizzled old storage admin who somehow made a career out of database backups, I wholeheartedly agree with all of this. Especially having someone else do a test restore. They don't have to be junior, just not intimately familiar with the systems involved.
I’ll go a step further:

Have a different person do it each time, having them add and refine the documentation and any tooling once they’ve done it. Keep any tools and scripts used fastidiously current-few things are worse than “to fix this issue, run the repair.sh script” only to find it stopped working 6 months ago because it relied on some extremely specific lib somewhere.

Oh, database backups are fun, especially if the database server is still running! You want multiple databases, all at a consistent point in time, without taking the system offline? SUFFER, YOU FOOL! The joys of realizing that file system snapshots won't help with data that's still to be committed and that taking a backup database-by-database means that two databases whose data relies on each other are no longer properly in sync really warms my heart. Oh wait, it's the whiskey that runs through my veins that does that, being a backup operator is a fantastic pathway into alcoholism. Especially once the databases become so large, that the time it takes to take the backup become a performance concern of the running system.

I think Postgres did it right by abbreviating their "Continuous Archiving and Point-in-Time Recovery" as PITR because it's very close to PITA. But PITR and CHECKPOINT actually make Postgres probably one of the better database systems to backup (and restore!), so yet another reason why I think it's a fantastic database.

One nice thing about a circa 2010 MySQL setup is setting up a new replica is easiest by restoring a backup. If you have to do that from time to time, your backups get tested by regular process.