|
|
|
|
|
by grogenaut
1273 days ago
|
|
CI/CD, flags, canaries don't catch everything, and can still cause outages to others. We try and do pretty heavy CI/CD where I work, but not everyone does (we, like everyone, has old systems). It's actually quite easy for us to have the well behaved systems honor release hours or not depending how their release history has gone, or coverage,etc... but they're well behaved, so they usually have great tests, and they're not usually panicked about rolling out after hours, they have their sh*t together. The reason we have core hours release only without director approval (aka director approval required outside core hours) is so you don't piss off another team by paging them after hours, and so you aren't trying to shove out a thing on a system that doesn't have good coverage or by turning off the safeties. In a large company I've noticed many engineers assume urgency even where there isn't. As an approver myself, most of the time someone wants to rush is because they've not even had the convo with their manager on if it's worth the risk, they are assuming urgency because that's when the sprint ends or what some TPM added to a jira ticket 4 months ago. I admit that sounds risky itself (the engineers not having the right risk training) but this is why we have a policy and tooling... most of the times I've dug in they're just very new and worried about perception as a new employee, so my job is to shepherd them through having that convo with their managers which inevitably has the managers saying "yes it can totally wait till monday", and the change is inevitibly a bit more hot than it should be due to accidental deadline pressure. |
|