Yeah, I don't think I'd go with less than RAID-6 (or full system redundancy plus 1 drive redundancy in each). Rebuilds just take too long, even with an in-chassis spare on RAID5.
Unfortunately Areca is really the only controller I've found which is well supported and does RAID6 fast.
would those be managed at that price? because it's a hell of a lot more expensive when you factor in the cost of devops to make sure it stays working and fails over properly.
I know what you mean. I have a lot of issues with AWS, but the AWS console is exactly what my manager needs so he can do things himself. Simple things such as AWS load balancing fails when we get any decent amount of traffic.
Luckily my RDS wasn't affected, but ELB merrily sent traffic to the affected zone for 30 minutes. (Either that or part of the ELB was in the affected zone and was not removed from rotation.)
We pay a lot to stay multi-AZ and it seems Amazon keep finding ways to show us their single points of failures.
Similar thing happened to me a while ago with a vendor. When your management team summons you to ask why the hell their site is down, you can't point fingers at the vendor if their marketing literature says it doesn't go down.
If you don't host your data in several alternative dimensions so that the same events wouldn't transpire in all of them - why not assume you'll encounter the occasional outage?
Did/does your standby replica in another AZ have any instance notifications stating there is a failure? The outage report claims there were just EBS problems in only one AZ.
No, nothing unusual with our standby replica. It's not even clear if it was the standby or our primary that was in the affected AZ.
Multi-AZ RDS does synchronous replication to the standby instance -- I'm guessing something broke in there. Hopefully AWS will update with a post mortem as they usually do. Lots of frustrated MultiAZ RDS customers on their forums.
Yeah unfortunately it looks to be an EBS problem and if your underlying EBS volume housing your primary DB instance takes a dump then that is unfortunately going to cause replication to fall over too
Multi-AZ RDS deployment is supposed to protect you from that though. That's why it's 2x the price. We should have failed over to a different AZ w/o EBS issues.
If your source EBS volume is horked then you aren't going to be replicating any data to your backup host while the EBS volume is messed up (since your source data is unavailable). EBS volumes also don't cross/failover between AZ boundaries.
Maybe there was something bad with your replication server before the outage? It's hard to guess without knowing exactly what was happening at the time...
The whole point is to protect you from problems in one AZ by keeping a hot standby in another AZ. It doesn't matter whether it's due to EBS, power, etc. This is one of the primary reason to use RDS instead of running MySQL yourself on an instance.