|
Agreed, it seems like a lot of people think about devops as a way to make dev's lives better without actively harming ops. The article does have a section on getting Ops on board with devops, but it's much weaker than the previous section on getting Dev on board. Devops done right should really improve both sides, as it explicitly recognizes the tradeoffs involved with the added automation / processes / policies. I've been reading Google's Site Reliability Engineering book ([1], free online, highly recommend so far) and it lays out pretty clearly how to do devops in a more balanced way. Key points are: - Define service level indicators (SLIs), metrics you can track to see how you're doing on the things your users care about: uptime, latency, security, etc. For instance, measure uptime as successful responses / total requests. - Define service level objectives (SLOs) based on your SLIs, e.g. 99.9% successful responses / total requests (aka 99.9% uptime) - Establish an error budget which controls your deployments. Based on your SLOs you can accept a certain number of errors (failed requests, outages, slow response times, etc) for a given period. If you spend your entire error budget, no new code can be deployed. The deployment shutoff has to be agreed upon in advance and it has to be a hard and fast rule, so it's not ops' fault that dev can't deploy code. - If you reach your error budget, devs need to work on things to reduce error rate going forward instead of developing new features. It can be tests, monitoring and alerting, rollback process, or pretty much anything, but devs should know that they're doing it so they can deploy more code faster in the future. I think this makes a lot of sense and helps to align dev and ops on uptime goals. Specifically, dev's ability to push code is now directly tied to service level, and ops is not shooting for 100% reliability. Dev can choose how to spend the error budget, so ops isn't dictating how they do things; and ops knows and expects a certain number of problems, so they don't resent dev every time they push buggy code. [1] https://landing.google.com/sre/book.html |