Hacker News new | ask | show | jobs
by pm90 2572 days ago
GCP is incredibly bad at communicating when there are problems with their systems. Just terrible. Its only when our apps start to break that we notice something is down, then look at the green dashboard which is even more infuriating.
3 comments

AWS is often the same way. No one seems to be good at communicating outage details.
I suspect there's a correlation between outages that are easy to detect and communicate and outages that automation can recover from so easily that you hardly notice.
I really don’t get this. There’s a huge number of complaints about poor communication from companies like Google and AWS during every outage. Yet they remain seemingly indifferent to how much customer trust they are losing, and the competitive edge the first one to get this right could gain.
I don't think they are losing any kind of customer trust.

Unless something is really fucked (like both GCP and AWS being down for us-east) incidents like these are not going to impact them at all.

The cost of either migrating to the other provider or, even worse, migrating to more traditional hosting companies is enormous and will require much more than "service was down for 2 hours in 2019". The contracts also cover cases like this and even if they don't, Google and Amazon can and will throw in some free treat as an apology.

On one hand I find this quite sad, but from a pragmatic point of view it makes sense.

If 20% of Google Cloud's customers leave after this outage because of poor communication they'll prioritise accordingly and apply all that nice SRE theories to their infra. But this isn't happening, because <various reasons>, so... who cares?
I mean, I care. All else being equal I’m not sure why you wouldn’t want good communication to your customers.
How much cloud spend do you control? That's the reality of how decisions are made.
Many millions of dollars per year. I care about how my providers behave when they have issues, and I can't see why you think it's not at all relevant.
Their dashboard does show red on GCE and networking right now, for what it's worth. https://status.cloud.google.com/
What aren’t these on separate systems? I never had the impression that google cheaps out on things but this sounds exactly like the sort of shit that happens when people cheap out. Not even a canary system?
The idea that Google spends big on expensive systems is a huge lie.

Google started using a Beowulf cluster that the founders wired themselves. From the very beginning, the goal of metrics collection was to optimize costs. While today it’s seen as the cash cow, the focus has always been on cheap components strung together, relying on algorithms and code for stability and making the least possible demands of underlying hardware.

To think that they won’t try to save money any time they can seems implausible.