|
|
|
|
|
by qxmat
1578 days ago
|
|
I've found that external tech requirements are horrible to work with, especially when the underlying stack simply doesn't support it. Normally these are pushed by certified cloud consultants or by an intrepid architect who found another "best practice blog." It's begins with small requirements such as coming up with a disaster recovery plan only for it to be rejected because your stack must "automatically heal" and devs can't be trusted to restore a backup during an emergency. Blink and you're implementing redundant networking (cross AZ route tables, DNS failover, SDN via gateways/load balancers), a ZooKeeper ensemble with >= 3 nodes in 3 AZs, per service health checks, EFS/FSX network mounts for persistent data that expensive enterprise app insists storing on-disk and some kind of HA database/multi-master SQL cluster. ... months and months of work because a 2 hour manual restore window is unacceptable. And when the dev work is finally complete after 20 zero-downtime releases over 6 months (bye weekend!) how does it perform? Abysmally - DNS caching left half the stack unreachable (partial data loss) and the mission critical Jira Server fail-over node has the wrong next-sequence id because Jira uses an actual fucking sequence table (fuck you Atlassian - fuck you!). If only the requirement was for a DR run-book + regular fire drills. |
|
It may be the case that 2 hours of downtime is completely unacceptable for the business, and paying $Xmm extra per year to maintain it is the right call. Or it may be that the business would be horrified to learn how many dollars are being spent to avert a level of downtime that no customer would notice or care about.
If the requirement is just being set by engineering, then it's more about finding the equilibrium where the resource spent on automation balances the cost of the manual toil and the associated morale impact on the team. Nobody wants to work on a team where everything is on fire all the time, and it's time/money well spent to avert that situation.