Hacker News new | ask | show | jobs
by shreyas056 5088 days ago
All the zones are geographically apart so its equivalent to putting server in different cloud/colo and probability that all of them gets "blown" away simultaneously is very less.
1 comments

And yet all it took was a single AWS availability zone going away for a short while for them to have a major outage.
Major issues in Netflix case per their last blog post due to bugs in their environment not properly failing away from dead ELB's. Also the issues were related to API backups due to everyone rushing to launch new instances in a new AZ, but existing services in other AZ's continued to work fine.
My reading of the Netflix announcement was that it wasn't just a bug, but that they made the conscious decision to include manual intervention in the process (of releasing dead instances) but grossly underestimated the time required to do this across an entire zone.