Hacker News new | ask | show | jobs
by abotsis 534 days ago
Couple thoughts here: 1. The “rightsizer” example mentioned might well have had the same outcome if the outage was analyzed in a “traditional” sense. That said, it is much easier and more actionable with this new approach. 2. I’ve always hated software testing because faults can occcur external to the software being tested. It’s difficult to reason about those if you have a myopic view of just your component of in system. This line of thinking somewhat fixes that- or at least paves a path to fixing that.

Unfortunately, while this article says a lot, much just repeated itself and I’d wish there was more detail. For example: who all is involved in this process? Are there limits on what can be controlled? How (politically) does this all shake out with respect to the relationships between SREs and software engineers? Etc..

1 comments

Agreed, the devil is in the detail for SRE functions, and the organizational details of how to leverage this framework are largely absent from this writeup. With so many teams struggling to get the organizational components right just for traditional SRE (due to budget constraints, internal politics, misunderstanding of SRE by leadership, etc), I'd imagine implementing the changes need to leverage the ideas in this writeup will be impossible for all but extremely deep-pocketed tech companies.

Nonetheless, lots of interesting concepts, so I would like to see a Google SRE handbook style writeup with more info that might be of more practical value.