| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vardump 1442 days ago

Well, there's software that can cause some degree of harm. For example through servos controlling something physical. While you still probably can't catch all of the issues, you damn better try as hard as you can within reason.

I'd also wish for similar rigor from people developing whatever filesystens my data is on. :-)

Fail fast is generally a good idea, if you can do it safely.

4 comments

marcosdumay 1442 days ago

If you can't fail safely, you better review your entire architecture.

Software fails, you can make failures rarer, but you can't make they go away. You have to deal with it, it's not an option.

link

vardump 1442 days ago

It's all really about risk management. Things can (and will) go wrong, and it doesn't only apply to software.

This involves a lot of thinking and collecting information about potential risks and evaluating their probability and severity.

Then you just mitigate the worst risks, probability times severity (other factors are also possible). Some residual risk always remains.

link

sargun 1442 days ago

I think the idea is that there are error recovery semantics that:

1. Determine the last sane state of the system, and work forward from there. (Read the servo position and try to go from there)

2. Have a the "recovery" routine to reset the system. (Take all positions to "zero")

3. Just stop. (Yes, I know this can be bad). And ask a human for help.

link

vardump 1442 days ago

If feasible, electromechanical methods are good.

link

roeles 1441 days ago

> I'd also wish for similar rigor from people developing whatever filesystens my data is on. :-)

Stable storage is a key factor in making this philosophy work. [1]

[1] https://qconlondon.com/london-2012/qconlondon.com/dl/qcon-lo...

link

_moof 1441 days ago

"Litter the code with aborts and test the ever-loving hell out of it" is more or less the strategy we use with flight software.

link