Hacker News new | ask | show | jobs
by theflyinghorse 1756 days ago
> and sometimes the fix is not an as easy as restarting a server.

Yes, but more often than not you just restart a couple of nodes, maybe kill and re up a couple containers and that's about it. The question is do the cases where you really have to dig in outweigh those where restarts just work?

2 comments

The problem is not how often disasters happen, the problem (for me) is that they will happen and you will have to fix it. Every minute the servers are down is money the company is losing. Just the mere possibility of that ruins everything. That’s why I, as a software engineer, don’t do on calls either. Money doesn’t fix my stress.
Asking genuinely ‘why are the restarts then not automated’?

Or is it that one gets paged after the automated restarts also have failed?

Edit: fix typo