Hacker News new | ask | show | jobs
by uji 2038 days ago
Have worked at AWS before, and I can attest to this. Whenever we had an outage, our director and senior manager would take a call on whether to update the dashboard or not.

Having 'red' dashboard catches lot of eyes, so people responsible for making this decision always look at it from political point of view.

As a dev oncall, we used to get 20 sev2s per day (an oncall ticket which needs to be handled within 15 mins) so most of the time things are broken, its just that its not visible to external customers through dashboard.

3 comments

Wow. If I were in charge, the team running a service should not be the same team who decides whether a given service is healthy. This is pretty damaging info about the unprofessional way AWS actually appears to be run.
It's funny that you point to that as the problem. The problem is more AWS' toxic engineering culture that has engineers fearing for their jobs in a way that guides their decision making. It's bad company culture, end of story.
AWS is big. Amazon is even bigger. Disgruntled people are the ones who often cry the loudest. Just because there may be teams who act like this, doesn't mean that is the case in general.

You don't hear a lot of people praising AWS, the same way you don't hear a lot of people saying how great it is to have an iPhone. If I am happy, I have little incentive to post about it, since that should be the default state.

But the matter of fact is simple. If you end up in a team like this, switch and raise complaints afterwards. Nothing stops you from it. There is no "toxic engineering culture" at AWS. The problem is that AWS makes you into an owner and that includes owning your career. That means if you feel something is wrong, YOU are expected to act. No one will do it for you. And there are plenty of mechanism for you to act.

This is the greatest benefit of working at Amazon but its also the downfall of people who are not able to own things.

> The problem is that AWS makes you into an owner and that includes owning your career.

Firing me for correctly telling customers that their services are down is not my idea of making me an owner.

You're the owner of aspects like responsibility and risk but not the owner of aspects related to financial growth (I mean, your stock options are, but that's about it).
Doing what you think is right, is not necessarily the right thing to do. This is why there is also "Disagree and Commit". There are many facets to this and I am 100% sure that you did not get fired for >correctly< telling customers... You could potentially get fired for incorrectly telling them though, if the issue was severe enough.
That sounds toxic.
>AWS makes you into an owner and that includes owning your career.

This sort of corporate jargon does not exactly instill confidence. I think I'm more concerned about Amazon's engineering culture now than I was before.

I empathize with the poster. Imagine being paid less than someone who works half as hard at another company, but more than your coworkers, to say cringe stuff like that.
"You don't hear a lot of people praising AWS"

You definitely hear a lot of people praising AWS.

This is 100% wrong, and only seeks to detail the conversation. A toxic way to think, and sets off a lot of red flags for me, essentially ruining their creditability.

  Disgruntled people are the ones who often cry the loudest. Just because there may be teams who act like this, doesn't mean that is the case in general.
Is right up there with "we don't know it wasn't aliens"
There are plenty of ways a work culture can make you utterly miserable yet you can't do anything about it. Perhaps you aren't confident enough, or things haven't yet reached the 'tipping point', or other options just aren't available to you for political reasons, lack of openings on other teams, lack of skills...

I think it's bigger than just "it's your problem, you own it". There are factors beyond your control.

As a customer I don't really care whether AWS has a toxic internal culture. I care about whether they have operational excellence and a high quality product. This information is showing cracks in operational excellence.
Guess what - most cloud providers are like that. My personal experience is with GCP where stuff can be majorly on fire and no status update for hours. Cloud SLOs are lies like a lot of other things there
My company will update their status but puts the most vague responses up. Reason is because we don’t want to appear inept when we crash the website. For example, because we ran out of disk space

Our competitors would have a field day with that

I think this is pretty typical, as often outsiders don't have the visibility into the issue to determine whether there's an issue.
The ec2 or s3 dashboards showing red literally requires approval from ajassy himself irrc

The status page is entirely manually updated.

Flipping anything to red entails significant legal and business complications. For starters you are basically admitting that customers deserve a refund for services not provided. Im not surprised that execs must be involved in that decision. You don't want random developer making a decision that could incur millions of dollars in potential loses when there are other strictly non-techincal factors to consider.
All I see in your response is, "We don't want to tell the truth because it might cost us money."

Maybe if it started costing the company actual money, it might make the investments necessary to ensure it doesn't go down in the first place.

The point is more like "we better be sure of the scale of the issue before that is communicated publicly and low level dev's on individual teams do not have that 10000 foot view of the system".

You have all the power you need to make the company change its behavior. Vote with your dollar and move to a different platform. I'm sure you have recommendations to share.

Oh, what a pipedream. If only capitalism worked how it was described in textbooks. It turns out there are much easier lower cost optimizations businesses can perform based on managing perception rather than worrying about pesky concepts like utility.
You raise an interesting point. Where I work, most of our public status dashboards update to yellow or red automatically, with only a few failure conditions requiring a manual update. It’s always made me wonder whether we’ll ever get around to implementing capitalism with some manual update only dashboards.
Given enough law suits and mistakes by dev flipping dashboards to red with a bad code change or network provider outage and your org will have a manual public facing dashboard as well.
I never considered Amazon capitalistic given their exploitation of the USPS.

I considered them this private company subsidized by taxes.

This might be an oversimplification.

With any customer that has SLAs written into their contracts, they're not just going off your status page. They most likely have a direct point of contact and exact reporting will be done in the postmortem.

The status page is for customers for which there aren't significant legal or business complications and exists to provide transparency. In my opinion you do want "random" people at your company to be able to update it in order to provide very stressed out customers with the best information you have.

As an industry we probably should recognize this more explicitly and have more standard status pages that are like "everything might be broken but we're not sure yet"

Status pages are generally so unreliable that we do our own monitoring of external cloud resources that we depend on.
So... then what's the point of a status dashboard?
> So... then what's the point of a status dashboard?

Exactly. Apparently it's just a marketing tool if you believe parent comments...

wow, so much to their "leadership principles" , the first one being "customer obsession" and "earning trust", from what I see, this doesn't accomplish either :|
I’ve got another good FAANG principal joke:

“Don’t be evil”

buys doubleclick