|
|
|
|
|
by tlarkworthy
1661 days ago
|
|
I remember trying to productionize an ordered service and the SRE's were banging on about messages-of-death. Their band aid solution were isolated regions. i.e. DO NOT LET YOUR REGIONAL SERVICES COMMUNICATE. What they were worried about were global cascading failures, if/when someone pushes a mistake to prod. It's kind of a shitty solution to the problem but there you have it, maybe it's the best that can be done. Rollout code changes gradually in individual regions and make sure a bug doesn't bring everything down. |
|