Hacker News new | ask | show | jobs
by hueving 3607 days ago
Absolutely none of that is an excuse for having an entire airline dependent on a single data center. It's about redundancy. You centralize administration, not the control plane itself. Quorum in database systems, load balancers, and DNS updates solved these problems a long time ago.

At this point I consider a company as large as this having such a rudimentary single point of failure to be incompetence in the IT department. We wouldn't be so forgiving if delta needlessly kept all of its pilots in one city during the night so a single storm wiped out every flight.

1 comments

You centralize administration, not the control plane itself.

You're still out of luck when the centralized admin center goes down, though. That's the place that is the source of all the humans performing the coordination and dispatching work. Having a bunch of extra data centers and backup generators around the country will not cause those humans to become accessible.

And building out full redundant continuity of everything, including the humans, is not something that tends to happen outside of major governments.

That's not what happened! It's the computer system that failed. The entire administrative team didn't just up and die.

Also, "centralize administration" just means that you can control everything from a single location. It doesn't preclude being able to control from multiple locations.

Think of AWS, you can control everything across multiple data centers from a centralized interface from anywhere with an Internet connection, even if entire data centers go down.

A sane system should essentially allow delta to operate from many possible locations seamlessly as long as they have the human operators required.