|
|
|
|
|
by vhold
1222 days ago
|
|
This is an example of why you want interoperable diversity in complex distributed systems. By having everything so standardized and consistent, they had the exact same failure mode everywhere and lost redundant fault tolerance. If they had different interoperable switches, running different software, the outage wouldn't have been absolute. When large complex distributed systems grow organically over time, they tend to wind up with diversity. It usually takes a big centralized project focused on efficiency to destroy that property. |
|
The practical downsides of this diversity live in the complexity of the interop (often slowing feature velocity), operations, and procurement/support.
But issues like the AT&T 4ESS outage have occurred before in IP networks, as an example, in some BGP bug. Diversity alleviates some of the global impact.