| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Mauricebranagh 1831 days ago

Bit surprising that they didn't override this as there was no fire and take the hit from the over temp.

Also they didn't have breathing gear (and trained staff) so you could go in and restart without waiting and also in case of an accident being able to try and rescue people.

Back when worked in RnD for the lab where we could have had a Freon leak we had breathing gear just outside and some people trained to use it

2 comments

doikor 1831 days ago

> Also they didn't have breathing gear (and trained staff) so you could go in and restart without waiting and also in case of an accident being able to try and rescue people.

At that point without actually going in and checking they have no real way of knowing if there really is/was a fire or not. So the proper procedure is to let the professionals handle it (wait for the fire department to clear the building). No amount of server downtime is worth sending a "not a firefighter" into a possibly burning building.

And facilities like these have strict control of where people can be so they know if someone is in there or not without of going in to check.

link

witrak 1831 days ago

> So the proper procedure is to let the professionals handle it (wait for the fire department to clear the building).

Nevertheless, having breathing gear could allow beginning recovery action just after the fire department finished procedures. This would shorten recovery time.

link

Mauricebranagh 1831 days ago

That was my point and large industrial sites quiet often have their own internal fire service.

link

Mauricebranagh 1831 days ago

Video cameras maybe or sensors that detect products of burning :-)

I would hope that they do audit people in and out so i case of accidents you can account for everyone.

Oh and these would be trained people

link

darkcha0s 1831 days ago

I'll gladly take a few minutes of outage, if it means some guy doesn't have to run into an oxygenless building with nothing but a breathing apparatus to restart my server

link

tetha 1831 days ago

Also if you follow AWS HA guidelines, this does not lead to a service outage. We were affected by this and it knocked a dozen or two systems offline for 6 hours or so. AZ redundancy took over and that was it and oncall went back to sleep.

link

nix23 1831 days ago

Just imagine aws would be used to safe life's, like medical information's.

link

darkcha0s 1828 days ago

If you are using a single region/DC to store safety critical data you're already doing it wrong, and whoever handles your disaster recovery plan should be fired

link

nix23 1828 days ago

If your using a american company for privacy related data should be fired you mean?

link

paranoidrobot 1831 days ago

I was about to reply that AWS shouldn't be relied on for safety-critical systems, but someone is probably already doing that.

I'll revise that to - I hope that whomever is relying on AWS for safety-critical systems at least does it over many regions. It's still dumb, because even AWS occasionally has global/multi-region outages, but at least it hopefully reduces the chance for it.

link

sofixa 1831 days ago

> I was about to reply that AWS shouldn't be relied on for safety-critical systems, but someone is probably already doing that

Wtf, why not? It's drastically easier, and probably cheaper, to achieve that level of redundancy with AWS than doing it yourself.

> It's still dumb, because even AWS occasionally has global/multi-region outages

Really? Like when? The only potential one you can claim was multi-region, was when S3 us-east-1 was down, and with the old default behaviour - if you didn't specify where your S3 bucket is it would pass through us-east-1 to ask where it is - that impacted lazy code that had nothing to do with us-east-1. That's almost entirely on developers and such though, so hard to claim it was a multi-region or global outage.

link