| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jrochkind1 1654 days ago

> I wish we would just throw up a generic "Shit's Fucked Up. We Don't Know Why Yet, But We're Working On It" message.

I gotta say, the implication that you can't register an outage until you know why it happened is pretty damning. The status page is where we look to see if services are effected, if that information can't be shared there until you understand the cause, that's very broken.

The AWS status page has become kind of a joke to customers.

I was encouraged to see the announcement in OP say that there is "a new version of our Service Health Dashboard" coming. I hope it can provide actual capabilities to display, well, service health.

From how people talk about it, it kind of sounds like updates to the Service Health Dashboard are currently purely a manual process. Rather than automated monitoring automatically updating the Service Health Dashboard in any way at all. I find that a surprising implementation for an organization of Amazon's competence and power. That alarms me more than who it is that has the power to manually update it; I agree that I don't have enough knowledge of AWS internal org structures to have an opinion on if it's the "right" people or not.

I suspect AWS must have internal service health pages that are actually automatically updated in some way by monitoring, that is, that actually work to display service health. It seems like a business decision rather than a technical challenge if the public facing system has no inputs but manual human entry, but that's just how it seems from the outside, I may not have full information. We only have what Amazon shares with us of course.

1 comments

theneworc 1654 days ago

Can you please help me understand why you, and everyone else, are so passionate about the status page?

I get that it not being updated is an annoyance, but I cannot figure out why it is the single most discussed thing about this whole event. I mean, entire services were out for almost an entire day, and if you read HN threads it would seem that nobody even cares about lost revenue/productivity, downtime, etc. The vast majority of comments in all of the outage threads are screaming about how the SHD lied.

In my entire career of consulting across many companies and many different technology platforms, never once have I seen or heard of anyone even looking at a status page outside of HN. I'm not exaggerating. Even over the last 5 years when I've been doing cloud consulting, nobody I've worked with has cared at all about the cloud provider's status pages. The only time I see it brought up is on HN, and when it gets brought up on HN it's discussed with more fervor than most other topics, even the outage itself.

In my real life (non-HN) experience, when an outage happens, teams ask each other "hey, you seeing problems with this service?" "yea, I am too, heard maybe it's an outage" "weird, guess I'll try again later" and go get a coffee. In particularly bad situations, they might check the news or ask me if I'm aware of any outage. Either way, we just... go on with our lives? I've never needed, nor have I ever seen people need, a status page to inform them that things aren't working correctly, but if you read HN you would get the impression that entire companies of developers are completely paralyzed unless the status page flips from green to red. Why? I would even go as far to say that if you need a third party's SHD to tell you if things aren't working right, then you're probably doing something wrong.

Seriously, what gives? Is all this just because people love hating on Amazon and the SHD is an easy target? Because that's what it seems like.

link

aflag 1654 days ago

A status page give you confidence that the problem indeed lies with Amazon and not your own software. I don't think it's very reasonable to notice issues, ask other teams if they are also having issues, and if so, just shrug it off and get a cup of coffee without more investigation. Just because it looks like the problem is with AWS, you can't be sure until you further investigate it, specially if the status page says it's all working fine.

I think it goes without saying that having an outage is bad, but having an outage which is not confirmed by the service provider is even worse. People complain about that a lot because it's the least they could do.

link

phlakaton 1654 days ago

I care about status pages, because when something breaks upstream I need to know whether it's an issue I need to report, and if there's additional problems related to the outage I need to look out for, or workarounds I can deploy. If I find out anything that might help me narrow down the ETA for a fix, that's bonus fries.

I don't gripe about it on HN, but it is generally a disappointment to me when I stumble upon something that looks like a significant outage but a company is making no indication that they've seen it and are working on it (or waiting for something upstream of them, as sometimes happens).

link

femiagbabiaka 1654 days ago

It is extremely common for customers to care about being informed accurately about downtime, and not just for AWS. I think your experience of not caring and not knowing anyone who cares may be an outlier.

link

glogla 1654 days ago

> Can you please help me understand why you, and everyone else, are so passionate about the status page?

I don't think people are "passionate about status page." I think people are unhappy with someone they are supposed to trust straight up lying to their face.

link

swasheck 1654 days ago

aws isn’t a hobby platform. businesses are built on aws and other cloud providers. those businesses customers have the expectation of knowing why they are not receiving the full value of their service.

it makes sense that part of marketing yourself as a viable infrastructure upon which other businesses can operate, you’d provide more granular and refined communication to allow better communication up and down the chain instead of forcing your customers to rca your service in order to communicate to their customers.

link

avereveard 1654 days ago

https://aws.amazon.com/it/legal/service-level-agreements/

There's literally millions on the line.

link