Hacker News new | ask | show | jobs
by foobarbazetc 1828 days ago
I like how something is “auto-healing” when it’s like… has `Restart=on-failure` in systemd.

Anyway, it’s always DNS. Always.

“Unfortunately, that allowed something as simple as a corrupted file to crash down multiple layers of redundancy with no real way of bringing things back up.”

You can spend many, many millions of $ on multi-AZ Kubernetes microservices blah blah blah and it’ll still be taken down by a SPOF, which, 99% of the time, is DNS.

Actual redundancy, as opposed to “redundancy”, is extremely difficult to achieve because the incremental costs of one more 9 are almost exponential.

And then a customer updates their configuration and your entire global service goes down for hours ala Fastly.

Or a single corrupt file crashes your entire service.

1 comments

>Anyway, it’s always DNS. Always.

Which is disappointing. An infrastructure where the backend is VERY easy to make highly redundant. Thwarted by decisions not to do that easy work, or thwarted by client libraries that don't take advantage of it.