|
|
|
|
|
by strictfp
4062 days ago
|
|
Error detection is unfortunately similar if not identical to the halting problem for most cases, and heartbeating is not an exception. For each system there is a single state which exhibits correct behaviour and an infinite set of states which produce incorrect behaviours. Detecting an error correctly in theory would therefore have to require testing of all possible code paths and inputs. This is of course impractical, so we try to find a middle ground, but this middle ground is in my experience far to simplistic to be of any real use in all but the simplest failure cases (IMO). For this reason, I prefer to spend more effort in proactive error prevention than reactive. Time spent improving stability of the product generally has a better payoff than adding fault detection and recovery, which IMO should only be used as a belt-and-braces approach of returning the system to a known state. But there is always another type of failure which your error detection cannot detect, and so you should never rely on it. |
|