Hacker News new | ask | show | jobs
by miken123 1651 days ago
Well, the narrative is sort of what Amazon is asking for, heh?

The whole us-east-1 management console is gone, what is Amazon posting for the management console on their website?

"Service degradation"

It's not a degradation if it's outright down. Use the red status a little bit more often, this is a "disruption", not a "degradation".

5 comments

Yeah no kidding. Is there a ratio of how many people it has to be working for to be in yellow rather than red? Some internal person going “it works on my machine” while 99% of customers are down.
I've always wondered why services are not counted down more often. Is there some sliver of customers who have access to the management console for example?

An increase in error rates - no biggie, any large system is going to have errors. But when 80%+ of customers loads in the region are impacted (cross availability zones for whatever good those do) - that counts as down doesn't it? Error rates in one AZ - degraded. Multi-AZ failures - down?

SLAs. Officially acknowledging an incident means that they now have to issue the SLA credits.
The outage dashboard is normally only updated if a certain $X percent of hosts / service is down. If the EC2 section were updated every time a rack in a datacenter went down, it would be red 24x7.

It's only updated when a large percentage of customers are impacted, and most of the time this number is less than what the HN echo chamber makes it appear to be.

I mean, sure, there are technical reasons why you would want to buffer issues so they're only visible if something big went down (although one would argue that's exactly what the "degraded" status means).

But if the official records say everything is green, a customer is going to have to push a lot harder to get the credits. There is a massive incentivization to “stay green”.

yes there were. I'm from central europe and we were at least able to get some pages of the console in us-east-1 -but i assume this was more caching related. Even though the console loaded and worked for listing some entries - we weren't able to post a support case nor viewing SQS messages etc.

So i aggree that degraded is not the proper wording - but it's / was not completly vanished. so.... hard to tell what is an common acceptable wording here.

From France, when I connect to "my personal health dashboard" in eu-west-3, it says several services are having "issues" in us-east-1.

To your point, for support center (which doesn't show a region) it says:

Description

Increased Error Rates

[09:01 AM PST] We are investigating increased error rates for the Support Center console and Support API in the US-EAST-1 Region.

[09:26 AM PST] We can confirm increased error rates for the Support Center console and Support API in the US-EAST-1 Region. We have identified the root cause of the issue and are working towards resolution.

I'm part of a large org with a large AWS footprint, and we've had a few hundred folks on a call nearly all day. We have only a few workloads that are completely down; most are only degraded. This isn't a total outage, we are still doing business in east-1. Is it "red"? Maybe! We're all scrambling to keep the services running well enough for our customers.
Because the console works just fine in us-east-2 and that the console on the status page does not display regions.

If the console works 100% in us-east-2 and not in us-east-1 why would they put the console completely down in us-east?

Well you know, like when a rocket explode, it's a sudden and "unexpected rapid disassembly" or something...

And a cleaner is called a "floor technician".

Nothing really out of the ordinary for a service to be called degraded while "hey, the cache might still be working right?" ... or "Well you know, it works every other day except today, so it's just degradation" :-)