Hacker News new | ask | show | jobs
by harshreality 4774 days ago
Running cronjob backups and looking at them in passing to see that they look like valid backups is not sufficient for any serious website or web service.

Automated backups need automated backup restoration and testing. Otherwise, the backups might not be created properly, or they might be perfect backups that have some hidden error that will cause them to fail when they're put to use.

As an example, Jeremiah Wilton's self-case study on Amazon's Oracle database problem in 1997. http://www.bluegecko.net/download/disaster-diary.pdf

Other than the one missed backup, backup procedures were fine. An Oracle bug caused Oracle to refuse to start due to a database format/schema change weeks earlier. TESTING backups would have caught the error, and allowed them to fix it before they took down their production database and triggered the bug on the next attempt to start it again.