|
|
|
|
|
by chias
1040 days ago
|
|
This was an interesting bit of math I did when I joined a startup. It's pretty counter-intuitive to think about how very large numbers can increase the importance of small ones. Say the company you work for is worth $10,000,000, and that you're hosted on GCP. Now take your best guess: what do you think the likelihood is of e.g. a fire or earthquake or something occurring in all relevant Google infrastructure simultaneously*, basically ushering in the end of all of your infrastructure, data, and backups? Frame that in a number of years. Is this kind of event something that may happen once in a thousand years? Once in ten thousand years? Let's say this is the sort of thing that might happen once in ten thousand years -- that's a long time! Then the cost of this particular risk to your company is $1000 / year. This kind of math isn't just a toy. When you have questions like "would maintaining actual physical backups in a safe somewhere outside of GCP be worth it?", you now have a framework to answer them ("if it would cost less than $1000 per year, then yes") -- * or substitute in your favorite company-ending event. |
|
This avoids two nasty problems with trying to express risk as an expected value.
The first is that it is hard to express all kinds of probabilities and damages numerically, not all kinds of damages convert easily to money, and some probabilities are hard to guess (you quickly get uncertain probabilities, but expected values just flatten those into an average again). Even without those issues pinning a number on it can lead to lots of discussion (good if you want discussion, not so good if you want to get shit done).
The second is that you easily fall into the trap of assuming everything has an average, and that the law of large number applies. While physics kind of helps you there by putting hard limits on the maximum amount of damage possible, you may end up in a situation where all nasty stuff is in the long improbable tail. Good example is earthquakes, magnitude increases tenfold for every point in the Richter scale but frequency also only decreases tenfold, what then is the average?
Well and something that's not really a big problem, but worth thinking about, some of these eventualities may very well cause you damage but are beyond your sphere of influence. Sure you should try to avoid going bankrupt if someone knocks over a server rack, but if all google data centres go down over an entire continent you've got bigger fish to fry. So focusing on the things you can do something about is a helpful way to keep focused.