One point it misses though is to test your backup strategy often. When you scale fast things break very often and it's good to be in practice of restoring from backups every now and then.
Just started reading a book called "High Performance MySQL" and in one of the early pages, the following advice appears:
"It's an excellent idea to run a realistic load simulation on a test server and then literally pull the power plug. The firsthand experience of recovering from a crash is priceless. It saves nasty surprises later."
Same goes for testing network connectivity and failover. I can't tell you how many times I've heard things like "The automatic recovery _should_ have kicked in but..."
Having a recovery procedure and backup strategy is completely different from having actually restored a backup and recovered from a failure.
Thanks! Good point. We actually repurposed our offsite database recovery to clone slaves off a master (after LVM was no longer performing), so that's a great way to get more testing in.
"It's an excellent idea to run a realistic load simulation on a test server and then literally pull the power plug. The firsthand experience of recovering from a crash is priceless. It saves nasty surprises later."
Same goes for testing network connectivity and failover. I can't tell you how many times I've heard things like "The automatic recovery _should_ have kicked in but..."
Having a recovery procedure and backup strategy is completely different from having actually restored a backup and recovered from a failure.