It's also not entirely static, there's a lot of checks in place because of licensing models, and beyond that there's different encoding and quality levels of streams to support various clients.
I understand the site isn't static, but fundamentally what they are serving are static video streams. Encoding for video streams of varying quality levels is entirely pre-computed, and thus seem like static assets. Anyways, my gripe is that I am not seeing the good reason(s) for not having a working failover plan ready to go at all times for the service driving a publicly traded company. Even scale doesn't seem like a good reason, as I'm sure Google GCE would love to get a few slices of the Netflix pie. So I'm just left perplexed..
The video streams are delivered from their OpenConnect appliances. The video encoding, their actual website and all the client interaction is run in AWS, active/active in three regions (and multiple availability zones per region).
The AWS part is also very dynamic, at any given time most customers are (unknowingly/behind the scenes) participating in 8-10 beta features.
That said, this is all based on talks and presentations they have given at various conferences in the past. It could be different, especially some AWS parts.
Obviously they do have a failover plan, but no plan is infallible -- especially when it involves a complex distributed software system plus human decision-making.
You never notice all the times when the failover is executed smoothly with no interruption in service, just the times when something goes wrong.
And I promise that there are fail overs, simulations, testing, smaller issues, moving loads around, etc happening all the time behind the scenes. Getting caught out is no fun, but it's a very low percentage of the times when changing the tires on the bus driving down the freeway just goes [mostly] without a hitch.