|
|
|
|
|
by adrianco
5096 days ago
|
|
Netflix is designed to run on two out of three availability zones in a region. There are tens of TB of customer data triple replicated in that region, which has off-region archive but we don't live replicate the data intensive data sources. We also have the Europe region which does live replicate things like membership (since all members are global members of Netflix). In this case we had some bugs, we should have had a two minute increase in error rate as a third of the clients retried, then the dead instances would have been out of traffic. That's what happened in the previous power outage, where fewer instances went down, and it didn't trigger this bug. |
|