Hacker News new | ask | show | jobs
by bongodongobob 671 days ago
$10k in lost sales/product during the downtime or $10k + the cost of IT to stand things back up, verify, resync + cost of other departments manually fixing other adjacent things that broke?

People who don't work daily in infra tend to not understand that downtime like that can have massive ripple effects. That one server, unknown to you, might have tentacles that reach all over the company. It might generate 100 tickets that now need to be verified by various IT personnel over the next few days in addition to their likely already full workload. It might have fucked up backups, DFS, patching cadence etc etc.

1 comments

Sure, the approach I advocated for can have much worse consequences in general. However, in this particular case it was ~impossible for the outage to get that costly - we operated the servers and knew the blast radius. My estimate was for the total cost.

Also, 2 engineers working for 3 months cost a ton of money, not even counting for the opportunity cost of other things they could’ve been doing. If the potential outage cost was closer to $100k I’d likely stick with my decision.