|
I think that’s a great policy as it’s clearly intended to help people when they need it, and get people to unplug when it’s valued by their loved ones. _However_ (that part is probably best bookmarked until Jan 2nd), it also betrays that your system is brittle and can be broken by a bad commit. Don’t do it because you want people to grind until Dec 24th at 6 pm. Do it because it’s great the rest of the year, too. I’d recommend you look into (or ask me about) feature flags, alerting, and automated roll-backs. The short version is: there’s a meta-system on top of your release process that can tell (if you are using roll-back not features flags):
- commits until xyzsdf are fine;
- roll-outs starting from commit abcdef have a 2% error rate, 80% on Android;
- revert to xyzsdf, send a message (low-priority, email) to the DevOps on call and the author of abcdef that it happened;
- for all commits after abcdef: if there no conflicts with xyzsdf, re-try to roll them out;
- if there is a conflict because they were on top or abcdef, send a message (low-priority email) to the authors that there is a conflict. There are more sophisticated versions that can do things like, if you use feature flags, flagging Android users to use the previous version. Another way to do this is to scale who has access to abcdef gradually: say 1% every hour, and revert if you detect issues. All those seem daunting to teams that haven’t worked like this before, but it my experience, they love it very fast. |
/However/, let me counter with the point: Just one of our customer has 8000 FTEs working with our system. During hell-time (aka, December and Christmas shopping and shipping), each of those dudes spends their shift taking customer calls lasting 2-4 minutes, which in turn require a few requests into our systems.
Due to the stress of their customers^2 (because it's Christmas and holidays and such), if an agent of a customer is unable to access our systems, they cannot handle the use case of the customer^2 and that will piss of the customer of the customer.
So if we push a bad change during this time, we're going to piss of hundreds of customers^2 per minute for that one customer alone. Even with a fast automatic rollback, that's a long time during hell-time. And they have people who know how to yell at vendors in nasty ways who don't like that.
I enjoy moving software fast and enabling moving software quickly, but customer focus and customer orientation means to understand when to move slow as well.
And hey, if that means more quiet holidays for the hard working operators on my team, who's gonna complain?