| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by res0nat0r 5144 days ago
	Did/does your standby replica in another AZ have any instance notifications stating there is a failure? The outage report claims there were just EBS problems in only one AZ.

1 comments

mikebo 5144 days ago

No, nothing unusual with our standby replica. It's not even clear if it was the standby or our primary that was in the affected AZ.

Multi-AZ RDS does synchronous replication to the standby instance -- I'm guessing something broke in there. Hopefully AWS will update with a post mortem as they usually do. Lots of frustrated MultiAZ RDS customers on their forums.

link

res0nat0r 5144 days ago

Yeah unfortunately it looks to be an EBS problem and if your underlying EBS volume housing your primary DB instance takes a dump then that is unfortunately going to cause replication to fall over too

link

mikebo 5144 days ago

Multi-AZ RDS deployment is supposed to protect you from that though. That's why it's 2x the price. We should have failed over to a different AZ w/o EBS issues.

link

res0nat0r 5144 days ago

If your source EBS volume is horked then you aren't going to be replicating any data to your backup host while the EBS volume is messed up (since your source data is unavailable). EBS volumes also don't cross/failover between AZ boundaries.

Maybe there was something bad with your replication server before the outage? It's hard to guess without knowing exactly what was happening at the time...

link

mikebo 5144 days ago

I don't think you're familiar with how Multi AZ RDS works: http://aws.amazon.com/rds/faqs/#36

The whole point is to protect you from problems in one AZ by keeping a hot standby in another AZ. It doesn't matter whether it's due to EBS, power, etc. This is one of the primary reason to use RDS instead of running MySQL yourself on an instance.

link

res0nat0r 5144 days ago

Yes...what also sounds plausible is that since this was an EBS outage that the underlying EBS volume wasn't detected as being unavailable (if it in fact did become unavilable) so no failover to your other RDS server was initiated.

link