|
|
|
|
|
by rconti
2523 days ago
|
|
I support you and hope you can reclaim yourself. A lot of good advice has been given here. One thing I realized when I stepped out of Ops is that lots of the life-and-death situations we felt were, to a certain extent, self-imposed. We always thought the world was going to end if $bigcustomer became dissastisfied that the site was down (or whatever). In the end, it turned out (as long as we did our best), it didn't really matter. The business was constantly making best guesses at allocation of resources (human, equipment, etc) and hedging that against the customer experience. 100% reliability was never the goal. Violating SLAs and potentially having to give the customer a service credit was just part of the calculus. We were taking it upon ourselves as the most important thing in the world to keep the service up, because that's how it was presented to us, but to the business it was not nearly as important. If you can de-couple yourself from these stresors somewhat, keep them at arm's length, you'll be doing yourself a great service. |
|
Seeing more of these large companies (Google, Facebook, Cloudflare) have significant outages has helped my confidence a ton. We are all human and make mistakes and can’t always figure out the fix immediately.
I’m just glad that as far as I know, none of the outages I was responsible for we’re life and death situations. That is stress I don’t want.