Hacker News new | ask | show | jobs
by santaragolabs 3363 days ago
Here's one that is maybe more concrete. And I hope I'm understanding everything correctly.

Say you're a startup running your infrastructure in AWS. You spread it out over three different regions and within each region you use 2 availability zones. Your network load is automatically balanced over these three geographical regions.

Now an earthquake happens in one region and although it's unlikely both of availability zones within that region go off line (fiber to the region is cut, power-loss, whatever). This means the entire region goes offline.

If modeled properly you should now be able to figure out what the consequences of this will be for the entire infrastructure. Will you be able to stay online if surprising behavior (an entire region going offline) happens?

Of course the big issue here is always mapping real world scenario's onto models that fit well enough.

EDIT: It's a matter of taking the "nasty integral" part out of it as per nerdponx in another comment on this thread. This can really help with doing Fault Tree Analysis for example as the statistics solving part there has always been a big problem for systems big enough (MCMC solvers help only to a degree).