Hacker News new | ask | show | jobs
by cpncrunch 2007 days ago
>both of those migrations failed with symptoms suggesting that whoever was performing them did not have deep understanding of systems architecture or safety practices and there was no one to stop them from failing.

Can any single person at Google have a full understanding of all the dependencies for even a single system? I have no idea, as I've never worked there, but I would imagine that there is a lot of complexity.

2 comments

somehow they managed to build complex systems like gmail, continuously develop new features there and not have massive outages due to "migrations" - suggests that something that they were doing right, they are no longer able to do
I'm pretty sure Google has had occasional severe outages for their whole history.
A single event is not data.
there were more than two recent incidents lately:

- YouTube outage this November 2020

- August 2020 outage of Google Suite including Gmail

in both cases no postmortems were published

Afaik, Google doesn't publish public PMs for non-paid offerings, so youtube doesn't get a public pm.

For the August outage, I believe there was a public pm. That said I can't find it now (I think there was some link rot somewhere, and I've escalated about that).

Two events in a single day.
To answer your question, the answer is yes. Some people do understand the dep stack. Takes years but hey there are lifers.