Hacker News new | ask | show | jobs
by dredmorbius 5142 days ago
Brief non-operation (reboot / service restart) is often better than a prolonged outage. Particularly where SLAs are set to create an expectation and acceptance of this, and where redundancy exists.

I'm thinking too that there's a feedback process at work here, and some sort of damping mechanism would help with that.

1 comments

Agreed, and many architectures are designed to have components "transparently fail" without impact to overall operation. When you have forced failures, feedback/damping is absolutely required. However, (my experience dictates) that most such failures are unplanned and unknowable at the outset, and you can only dampen conditions which are predictable.