Hacker News new | ask | show | jobs
by tlarkworthy 1661 days ago
I remember trying to productionize an ordered service and the SRE's were banging on about messages-of-death. Their band aid solution were isolated regions. i.e. DO NOT LET YOUR REGIONAL SERVICES COMMUNICATE. What they were worried about were global cascading failures, if/when someone pushes a mistake to prod.

It's kind of a shitty solution to the problem but there you have it, maybe it's the best that can be done. Rollout code changes gradually in individual regions and make sure a bug doesn't bring everything down.