Hacker News new | ask | show | jobs
by flanbiscuit 2561 days ago
This is the 2nd time a Google product has been down in the last month. What is going on there?

Previous one was only 15 days ago: https://news.ycombinator.com/item?id=20077421

4 comments

Ikr, Instagram recently went down too, what's going on with FANG recently? Back a few years ago things like this would be unheard of
In what year exactly did none of Facebook, Apple, Netflix, and Google have a service outage like this?
As the person most closely responsible for Netflix’s nines from 2011-2014, definitely not 2011-2014!

We worked hard and managed to keep it pretty high, but in all that time I think we only had two or three perfect weeks worldwide.

And with the chrome/adblock mess it feels a lot worse.

My opinion is worth nothing but Google feels like a crumbling cookie now.. it used to be a cool addition to one's life.

Welp it still is.
> What is going on there?

Nothing but anecdote, but a lot of my friends at Facebook and Google are eyeing the exits.

They’ve made good money and can afford to go somewhere better aligned with their values. They’re also each remarkably talented.

These outages may be a reflection of that exodus. (Counterfactual: we started our careers at the same time and are nearing a natural switching point simultaneously.)

How do you think it relates to the failures exactly? Employee quality is down? The servers had someone manually monitoring them and they quit so now the servers are quietly on fire? What's the theory?

Hint: companies at Google's scale do not have a single point of failure where employees slowly trickling in or out can impact their infrastructure in this way. You would see many more failures in this case. The tenure for the average employee is ~2 years.

I think this is a dangerous assumption to make.

I've worked at big companies and small ones. Every place has a small pool of extraordinary technical talent (the 10x or 100x engineers or whatever). It isn't just that these people are geniuses (although some of them def are), its how much context they have around the systems that are critical to the functioning of the Company. They have that context + dedication to have learned about different failure scenarios. They probably have built the automation systems that deploy the services.

When such people leave, its not the end of the company, someone else (either a person or a group) usually are interested and step up to knowledge transfer before the person leaves and then learn the system.

However, if a critical mass of people leave at or around the same time, crucial knowledge that is necessary for the systems to operate correctly is lost. This may not surface immediately, but when something goes wrong, you will notice it.

I'm not saying this is what happened to Google, IDK. But its very much a possiblity, even at the largest companies. Especially the ones that have somewhat centralized systems, so outages tend to affect a whole bunch of seemingly unrelated services.

Are you suggesting that quality of services offered is invariant of employee quality?