Hacker News new | ask | show | jobs
by dekhn 1655 days ago
Sure, but... that just raises more questions :)

Taken literally what you are saying is the service could be down and an executive could override that, preventing them for paying customers for a service outage, even if the service did have an outage and the customer could prove it (screenshots, metrics from other cloud providers, many different folks see it).

I'm sure there is some subtlety to this, but it does mean that large corps with influence should be talking to AWS to ensure that status information corresponds with actual service outages.

3 comments

Large corps with influence get what they want regardless. Status page goes red and the small corps start thinking they can get what they want too.
> Status page goes red and the small corps start thinking they can get what they want too.

I think you mean "start thinking they can get what they pay for"

I have no inside knowledge or anything but it seems like there are a lot of scenarios with degraded performance where people could argue about whether it really constitutes an outage.
One time gcp argued that since they did return 404s on gcs for a few hours that wasn’t an uptime/latency sla violation so we were not entitled to refund (tho they refunded us anyway)
Man, between costs and shenanigans like this, why don't more companies self-host?
1. Leadership prefers to blame cloud when things break rather than take responsibility.

2. Cost is not an issue (until it is but you’re already locked in so oh well)

3. Faang has drained the talent pool of people who know how

If you think that’s bad you should see the outages when you self host without a big enough team to really manage it.
Opex > Capex. If companies thought about long term, yes they might consider it. But unless the cloud providers fuck up really badly, they're ok to take the heat occasionally and tolerate a bit of nonsense.
You can lease equipment you know…
Yep. I was an SRE who worked at Google and also launched a product on Google Cloud. We had these arguments all the time, and the contract language often provides a way for the provider to weasel out.
Like I said I never worked there and this is all hearsay but there is a lot of nuance here being missed like partial outages.
This is no longer a partial outage. The status page reports elevated API error rates, DynamoDB issues, EC2 API error rates, and my company's monitoring is significantly affected (IE, our IT folks can't tell us what isn't working) and my AWS training class isn't working either.

If this needed a CEO to eventually get around to pressing a button that said "show users the actual information about a problem" that reflects poorly on amazon.

My friend works at a telemetry company for monitoring and they are working on alerting customers of cloud service outages before the cloud providers since the providers like to sit on their hands for a while (presumably to try and fix it before anyone notices).