Hacker News new | ask | show | jobs
by markild 920 days ago
This all depends on the risks involved.

Can I handle it failing, sure, go ahead. There are so many variables that could be involved, not uncommonly including temporal ones. I don't think simply monitoring for a period and calling it healthy is of any guarantee.

I do very much think you are right though. Being too risk averse will grind everything to a halt.

Your whole process has to be designed around avoid these issues. Allow failures, fix continuously and _quickly_, don't repeat mistakes.