|
|
|
|
|
by abotsis
534 days ago
|
|
Couple thoughts here:
1. The “rightsizer” example mentioned might well have had the same outcome if the outage was analyzed in a “traditional” sense. That said, it is much easier and more actionable with this new approach.
2. I’ve always hated software testing because faults can occcur external to the software being tested. It’s difficult to reason about those if you have a myopic view of just your component of in system. This line of thinking somewhat fixes that- or at least paves a path to fixing that. Unfortunately, while this article says a lot, much just repeated itself and I’d wish there was more detail. For example: who all is involved in this process? Are there limits on what can be controlled? How (politically) does this all shake out with respect to the relationships between SREs and software engineers? Etc.. |
|
Nonetheless, lots of interesting concepts, so I would like to see a Google SRE handbook style writeup with more info that might be of more practical value.