Netflix site is down | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Netflix site is down (outage.report)
	34 points by tomerific 3617 days ago

7 comments

jaytaylor 3617 days ago

Question:

Given the relatively limited amount of static content they distribute, and what seem like only daily updates, how is there not a switch they can flip when things go south to spin the service up in another region or on another provider?

Seems like it'd be the logical thing to do given AWS is always going to have another outage, and NFLX has lots of time and smart engineers to plan and prepare for these eventualities..

bbarn 3617 days ago

I think you might underestimate the scale here. It's more like, it's "always on, in all regions, with as many providers as they can". This is a company that's had to innovate in basically every possible business space it can to keep delivering what it has been.

tracker1 3617 days ago

It's also not entirely static, there's a lot of checks in place because of licensing models, and beyond that there's different encoding and quality levels of streams to support various clients.

jaytaylor 3617 days ago

I understand the site isn't static, but fundamentally what they are serving are static video streams. Encoding for video streams of varying quality levels is entirely pre-computed, and thus seem like static assets. Anyways, my gripe is that I am not seeing the good reason(s) for not having a working failover plan ready to go at all times for the service driving a publicly traded company. Even scale doesn't seem like a good reason, as I'm sure Google GCE would love to get a few slices of the Netflix pie. So I'm just left perplexed..

mioelnir 3617 days ago

The video streams are delivered from their OpenConnect appliances. The video encoding, their actual website and all the client interaction is run in AWS, active/active in three regions (and multiple availability zones per region).

The AWS part is also very dynamic, at any given time most customers are (unknowingly/behind the scenes) participating in 8-10 beta features.

That said, this is all based on talks and presentations they have given at various conferences in the past. It could be different, especially some AWS parts.

teraflop 3617 days ago

Obviously they do have a failover plan, but no plan is infallible -- especially when it involves a complex distributed software system plus human decision-making.

You never notice all the times when the failover is executed smoothly with no interruption in service, just the times when something goes wrong.

seanp2k2 3617 days ago

And I promise that there are fail overs, simulations, testing, smaller issues, moving loads around, etc happening all the time behind the scenes. Getting caught out is no fun, but it's a very low percentage of the times when changing the tires on the bus driving down the freeway just goes [mostly] without a hitch.

mali9 3617 days ago

Yes, you are right. Infact I was wondering the same too. They also make sure their systems are resilient by testing out scenarios as simple as one instance going down [1] to a whole data center going down [2] and yet this happens. I guess we have to wait till the post-mortem report comes in on this.

[1]- http://techblog.netflix.com/2012/07/chaos-monkey-released-in... [2]- http://techblog.netflix.com/2011/07/netflix-simian-army.html

pfarnsworth 3617 days ago

They do this all the time. They switch back and forth between regions very frequently to test exactly this scenario.

Aelinsaar 3617 days ago

It's still down too... on a Saturday night they must be absolutely hounded with complaints.

freyir 3617 days ago

Working fine on my end.

kpcyrd 3617 days ago

Meta: why does https://outage.report redirect to http://outage.report ?

foepys 3617 days ago

I have heard that some sites reported a 30% decline in ad revenue as soon as they started using HTTPS. I don't know the reason for this, tough.

captn3m0 3617 days ago

The reason is probably because not all ad-networks support https, and it you can't make the same money on HTTPS-only ads (since fewer networks will bid on it). Putting up an HTTP ad would guarantee now that it is not seen, so sticking back to http makes sense that way.

mali9 3617 days ago

Does Netflix have a standard status dashboard like other services do ?

trimbo 3617 days ago

Do any consumer sites have that? Seems like more of an enterprise-SLA type thing.

ncphillips 3617 days ago

Interesting, I cannot login on their website, but I can access it on my phone.

cesarbs 3617 days ago

I'm having fun watching the outrage on the Twitter feed.

UnoriginalGuy 3617 days ago

It appears as if it just came back up for my region. Literally in the last five minutes.

Obviously people's mileage may vary, since it could be region dependant.