|
|
|
|
|
by ctvo
2561 days ago
|
|
How do you think it relates to the failures exactly? Employee quality is down? The servers had someone manually monitoring them and they quit so now the servers are quietly on fire? What's the theory? Hint: companies at Google's scale do not have a single point of failure where employees slowly trickling in or out can impact their infrastructure in this way. You would see many more failures in this case. The tenure for the average employee is ~2 years. |
|
I've worked at big companies and small ones. Every place has a small pool of extraordinary technical talent (the 10x or 100x engineers or whatever). It isn't just that these people are geniuses (although some of them def are), its how much context they have around the systems that are critical to the functioning of the Company. They have that context + dedication to have learned about different failure scenarios. They probably have built the automation systems that deploy the services.
When such people leave, its not the end of the company, someone else (either a person or a group) usually are interested and step up to knowledge transfer before the person leaves and then learn the system.
However, if a critical mass of people leave at or around the same time, crucial knowledge that is necessary for the systems to operate correctly is lost. This may not surface immediately, but when something goes wrong, you will notice it.
I'm not saying this is what happened to Google, IDK. But its very much a possiblity, even at the largest companies. Especially the ones that have somewhat centralized systems, so outages tend to affect a whole bunch of seemingly unrelated services.