| > Seems logical to assume that could be because there aren't many people left who know how all the systems connect. It's only logical presupposing a lot of other conditions, each of which is worthy of healthy skepticism. And even then, it's only a hypothesis. You need evidence to go from "this could have contributed to the problem" to "this caused the problem." Based on what little is given in the article, it seems to go strongly against this hypothesis. For example it links to multiple past findings that Amazon's notification times need improvement going back to 2017. If something has been a problem for nearly a decade, it's hard to imagine it is a result of any recent personnel changes. TFA does not establish how many AWS workers have left or been laid off, nonetheless how many of those were actually undesirable losses of highly skilled individuals. Even if we take it on faith that a large number of such individuals were lost, it is another bridge further to claim that there was neither redundancy in that skillset which remained, nor that any vacancies have been left unfilled since. No evidence is given that indicates that if a more experienced team were working on the problem it would have been identified and resolved faster. The article even states something to the opposite effect: > AWS is very, very good at infrastructure. You can tell this is a true statement by the fact that a single one of their 38 regions going down (albeit a very important region!) causes this kind of attention, as opposed to it being "just another Monday outage." At AWS's scale, all of their issues are complex; this isn't going to be a simple issue that someone should have caught, just because they've already hit similar issues years ago and ironed out the kinks in their resilience story. Indeed, the article doesn't even provide evidence that the response was unreasonably slow. No comparison to similar outages either from AWS in the past, before the hypothecated brain drain, nor from competitors. Note that the author has no idea what the problem actually was, or what AWS had to do to diagnose the issue. |