Hacker News new | ask | show | jobs
by gizmo385 2548 days ago
> There are a lot of moving pieces in our system and sometimes there are things outside of Google's control.

Are you implying that the cause of this outage is not Google's fault? If so, can you go into more details about that?

2 comments

> The disruptions with Google Cloud Networking and Load Balancing have been root caused to physical damage to multiple concurrent fiber bundles serving network paths in us-east1, and we expect a full resolution within the next 24 hours.

From the dashboard. Looks like this can be blamed on an Act of Backhoe.

Not him but oftentimes cloud outages can be due to issues with the network connections to the datacenter, or power outages.

Datacenters also sometimes have other single points of failure such as DNS, but those are within the company's control.

https://www.networkworld.com/article/3373646/network-problem...

https://www.datacenterknowledge.com/uptime/equinix-power-out...

But data centers are typically designed with network and power failures in mind, not? Isn’t this why these kind of ring based network topologies exist, so that whenever a single network connection fails, it can still easily be routed around?
Almost always, yes, but the problem is that everyone has to start routing around the problem and it creates congestion. Those redundant pipes don't sit idle. They are sharing the traffic.

As mentioned in another thread, in this case, Google has rerouted google.com traffic out of the region to try to mitigate the congestion.

On a smaller scale, to link up a few datacenters that are a few miles apart? Sure. On a grand scale though, no. Nobody's running an extra undersea cable from Japan to Singapore so that they can have a ring topology. Or trenching a second PBps of cables across the Appalachian Mountains. When something like that gets busted you go and reroute your least important traffic and send out the repair crew.