|
|
|
|
|
by bbrazil
3947 days ago
|
|
From experience I think some of the problem is that not everyone appreciates the importance of correctness in this sort of system. At the very least you should clearly documenting your expected failure modes, so that it's possible to build correct things on top of it. In part it's hard to convince everyone of the importance of spending time on what looks like an unlikely corner case until after the outage - when it also becomes much harder to fix as you usually need to build this into the core design of your system. |
|