| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zzzcpan 2167 days ago
	In web and online infrastructure pretty much nothing is out of your control except for two things: ISPs people use and domain name registrar you use for your domain name. And even domain name registrar centralization can be mitigated against by having multiple domains from multiple registrars and promoting different domains to different users and having backup communication channels to inform users about new domains in case something happens. Other than that it's your choice whether to make your infrastructure dependent on a bunch of unreliable centralized SPOFs from big corporations or build highly available infrastructure relying on servers from many different providers running your own DNS servers with DNS routing, failover, etc. You will definitely beat Cloudflare's availability this way many times over.

1 comments

katzgrau 2167 days ago

And you will still be exposed to being blindsided by something out of your control. It's really only in your control of you can think of and plan for it ahead of time. And there will certainly be things that we don't consider. You can call that a failure but it happens all the time and it's reality.

What if a political event impacts you, for instance? A pandemic? A storm taking out a major data center? A weird Linux kernel edge case that only happens beyond a certain point in time? That only sounds ridiculous because it hasn't happened, but weird things like that happen all the time. There are so many unseen possibilities.

I understand that might sound unreasonable or facetious or like I'm expanding the scope.

The point is, the more confident that you've built something that has no SPOF the more exposed your are to the risk of it, because one probably does exist.

link

zzzcpan 2167 days ago

Honestly, you are not making any sense. This is not how engineering works. If you design for resilience, you get more resilience and you build confidence as you see the evidence how the system works in real world. Furthermore, with resilience you have to always cover all risks, it's just that you don't immediately reach fine granularity of decisions that don't trigger failover to servers in different countries, you improve granularity as you learn from actual operations and modify your designs accordingly.

I remember when I first deployed DNS routed system it was too reactive, constantly jumping between servers, monitoring was too sensitive, it didn't wait for servers to stabilize to return them into the mix and exponential backoff was taking servers out for far too long. But even given all that it was still able to avoid outages caused by data center failures and connectivity problems.

link

katzgrau 2167 days ago

It does make sense, and it's paradoxical, I know.

> If you design for resilience, you get more resilience and you build confidence as you see the evidence how the system works in real world.

You simply can't foresee or eliminate all risk. This is referred to as "the turkey problem." It's not my idea, but one I certainly subscribe to.

https://www.convexresearch.com.br/en/insights/the-turkey-pro...

link

zzzcpan 2167 days ago

The whole idea behind resilience is to cover unforeseeable risks, the turkey problem just doesn't apply here. I would even say if the system doesn't solve the turkey problem it cannot be called resilient. And high availability without resilience is not practically possible.

link

katzgrau 2167 days ago

> The whole idea behind resilience is to cover unforeseeable risks

Speaking of things that don't make sense... if it's unforeseeable, one will have a difficult time adequately preparing for it

link

zzzcpan 2167 days ago

It's not difficult, it's just different. It's the difference between predicting that a truck might crash into a data center and building concrete wall around it, and designing a system in a such way that users only ever resolve to servers that are currently available regardless of what happened to some of them in a data center that had a truck crashed into it.

link

katzgrau 2167 days ago

... and after you've solved for the truck problem, you have a potentially infinite list of other things to plan for, some of which you will not foresee. And of course, there's probably an upper bound on the time you can spend preparing for such things.

Famous to the point of being a cliche, the titanic was thought to be unsinkable, and I would have a similarly hard time convincing the engineers behind the ship's design to believe otherwise.

The level of confidence you're displaying in predicting the unforeseeable is something you may want to take a deeper look at.

link