Hacker News new | ask | show | jobs
by ansible 3686 days ago
In addition to what Yev has said, I'd say:

You don't have backups, unless you can do restores. So you have to practice doing restores.

For my servers at work, we're using btrfs snapshots and send/receive to the backup host. So restoring files is just going into the appropriate snapshot directory, and copying out the files of interest.

If your backup scheme is any more complicated than that, you need to practice it at least a few times per year so that it is completely familiar.

Hilarious story from the old days...

We were doing backups to QIC tape drives. At one point, there was a lightning storm. The servers were plugged into UPSs with power protection though.

However, when running a backup, I noticed that the tape drive sounded a little different. So I check one of the backup tapes... the tape drive would no longer switch over the tracks on the tape. So it was just overwriting the same track again and again. Corrupted backups. Worse yet: silently corrupted backups. No messages from the OS about a hardware problem.

That could have been bad news if it wasn't caught quickly.

4 comments

> You don't have backups, unless you can do restores. So you have to practice doing restores.

100% - I worked hard to make sure that was in the Best Practices sent out to every person that signs up with Backblaze. Restoring is the most important part. So far we have over 200PB of backups, but the stat that I like even more is that we have restored over 10 Billion files.

I realize this is slightly off topic, but I want to nerd out for a moment re: your comment on hearing something wrong with the tape drive; a skill I always felt was under-recognized for how much of a "superpower" it gives you, that being how critical sound is for a good sysop. Broken AC belts, bad hard drive backplanes, boot cycles, all things I've run into where the sound was the cue; detecting an unalerted tape drive failure is the icing on the diagnostic case.
Audio is an incredibly rich feedback mechanism for all kinds of mechanical processes. And the fascinating part about it is that our brains process it so effortlessly. If the data your ears can analyze from your car were presented visually, perhaps as a scrolling FFT spectral graph or plots of a host of sensors, you'd never notice a momentary misfire or a tiny change in pitch. It would be complete data overload! But even untrained ears can pick out errant noises.
I had another incident like that earlier in my IT career.

I was a 'terminal room consultant' in college... back when we had serial terminals hooked to Unix systems. Part of the job was the care and feeding of a couple printers, a big ol' line printer (green bar paper) and a Printronix graphics printer (dot matrix, for printing out fancy lab reports you wrote up using troff).

So over time, from loading paper and clearing jams, I had accumulated hours and hours of hearing these two guys chatter as they went about their business.

At one point, I noticed that the Printronix printer sounded funny. Just off, in some way. So I call it in for maintenance, but they don't seem to care what an undergraduate punk thought about printer sounds.

Sure enough, a week later, I see it is down and taken apart for repairs.

Your ears, your nose, all your senses should be used for debugging and general investigation.

Here in the HGST EMEA lab, you will often find an engineer listening to a drive spin up with an induction pick-up and amplifier, muttering something like "Yep, this one's running firmware XYZ", or "Hmm, sounds like this one has the older, unmodified ramp".
> You don't have backups, unless you can do restores.

Bingo, I was just explaining this to someone yesterday. Testing the restores MUST be part of the backup strategy. If your db data is small enough to have it all in your test environment, I often try to test the restores by restoring to the test db and then using that db for the test environment until the next test restore.

I had a drive fail a couple years ago and restored it with Backblaze. The one gotcha is it wasn't a bootable backup, so it was still a pain getting my system back to something approximating what it was before. Since then I've added a weekly bootable backup to a local USB drive. Not failsafe but good enough for my needs.