Hacker News new | ask | show | jobs
by chias 1040 days ago
This was an interesting bit of math I did when I joined a startup. It's pretty counter-intuitive to think about how very large numbers can increase the importance of small ones.

Say the company you work for is worth $10,000,000, and that you're hosted on GCP. Now take your best guess: what do you think the likelihood is of e.g. a fire or earthquake or something occurring in all relevant Google infrastructure simultaneously*, basically ushering in the end of all of your infrastructure, data, and backups? Frame that in a number of years. Is this kind of event something that may happen once in a thousand years? Once in ten thousand years? Let's say this is the sort of thing that might happen once in ten thousand years -- that's a long time!

Then the cost of this particular risk to your company is $1000 / year.

This kind of math isn't just a toy. When you have questions like "would maintaining actual physical backups in a safe somewhere outside of GCP be worth it?", you now have a framework to answer them ("if it would cost less than $1000 per year, then yes")

--

* or substitute in your favorite company-ending event.

3 comments

This is similar, but one of the benefits of thinking about 'things that musn't happen' and relating them to 'things that can go wrong and how to prevent them' is that it avoids talking about expected damage.

This avoids two nasty problems with trying to express risk as an expected value.

The first is that it is hard to express all kinds of probabilities and damages numerically, not all kinds of damages convert easily to money, and some probabilities are hard to guess (you quickly get uncertain probabilities, but expected values just flatten those into an average again). Even without those issues pinning a number on it can lead to lots of discussion (good if you want discussion, not so good if you want to get shit done).

The second is that you easily fall into the trap of assuming everything has an average, and that the law of large number applies. While physics kind of helps you there by putting hard limits on the maximum amount of damage possible, you may end up in a situation where all nasty stuff is in the long improbable tail. Good example is earthquakes, magnitude increases tenfold for every point in the Richter scale but frequency also only decreases tenfold, what then is the average?

Well and something that's not really a big problem, but worth thinking about, some of these eventualities may very well cause you damage but are beyond your sphere of influence. Sure you should try to avoid going bankrupt if someone knocks over a server rack, but if all google data centres go down over an entire continent you've got bigger fish to fry. So focusing on the things you can do something about is a helpful way to keep focused.

While I agree with your math, I am curious how much your physical backups would be worth if something so catastrophic occurred that all of Google/AWS/Azure cloud services were destroyed. Whether that be an act of war, a massive solar flare, etc., I am curious if it would even matter anymore that you had those backups.
Similarly:

"We are spending 50$ per month just for one test in our code. We could cut it down to 10$ if we wanted."

"How many hours would it take to reduce the spend? If it's more than a couple of hours for a senior engineer, then it's not worth it."

We kept spending money on this inefficient test and it was the right choice.

Reciprocally, when low wages are available in manufacturing, comapnies are less likely to use automated proccesses because labor is so cheap it can cost less to eg. have two buckets/wheelbarrows between two parts of a factory line with one person to swap them rather than use a conveyor belt. Getting 100% automation would allow factories to come back to the US but getting that last 20% at a competitive cost is difficult. See America’s largest tool company couldn’t make a wrench in America (wsj.com).[0]

[0] https://news.ycombinator.com/item?id=36828861