Hacker News new | ask | show | jobs
by empathy_m 1057 days ago
The part that resonates here is saying

"ah yes well we have a full database backup so we can do a full restore", then

"the full restore will be tough and involve downtime and has some side effects," then

"I bet we could be clever and restore only part of the data that are missing", then

doing that by hand, which hits weird errors, then

finally shipping the jury-rigged selective restore and cleaning up the last five missing pieces of data (hoping you didn't miss a sixth)

Happens every time someone practices backup/restore no matter how hard they've worked in advance. It always ends up being an application level thing to decide what data to put back from the backup image.

1 comments

I agree with you. The phrase is you don’t have backups unless you test your backups.

But in this case I don’t really get what the issue is. Restore everything from the last good backup and people miss some posts made in the meantime, sucks, but it’s an instant solution instead of hand work and uncertainty.

When I worked as a VMS sysadmin full restore checks were one of the things I insisted on doing, sure, it used up a morning every couple of weeks, and tied up one of our microvaxes, but it was worth it.

Especially three months after I finished being sysadmin and moved to development, and they had a disk failure.

me: 'so you have backups?'

the replacement: 'sure, but they didn't restore'

me: 'what's the last good backup you have?'

tr: 'august, the last one you did'

me: 'welp'

tr's boss: 'guess £390,000 for third party disk recovery is our only option...'

To add some context...

Yes, it was documented in our ISO 9000 docs. But only 'strongly recommended' to perform a regular/routine test restore. I attempted to get it converted to a mandatory step, but since I was only a temporary sysadmin and an intern, it wasn't going to happen.

I was told by my predecessor (who was a direct contractor to my employer) to perform it as routinely as I could. I would guess that he had attempted to get it put as a mandatory step, but his time was billed, mine wasn't, so shrug.

My/the replacement was an external contractor as part of a 'company Y now provides system administration services' deal, who presumably ended up eating the liability of not having working backups that they were contracted to produce.

As horrified as I was, 'it's not really my problem, I wasn't responsible' was the only attitude I could bear to take. Besides, I was busy with fortran.

"If you've never tested your backup solution, you don't have a backup solution."