|
|
|
How to prevent software bug from killing lives like Boeing's or Tesla's?
|
|
10 points
by leeu1911
2637 days ago
|
|
As an experienced software engineer, I'm sure people make mistakes as to err is human. Working at dahmakan.com, a SEA food delivery startup, the worst can happen when our software has problem is someone won't get a meal on time or a meal is wasted. But for software of Boeing or Tesla, it is highly more critical when errors happen as we saw. I would love to learn about your suggestions/experience about preventing these costly mistake from happening. |
|
Even your example "the worst can happen when our software has problem is someone won't get a meal on time or a meal is wasted." isn't really true. What if you ordered fish, or oysters, and they were left out too long and caused some kind of food poisoning (just as an example).
There are many levels of thinking about this problem. Maybe you can have a sticker on the package that reacts to temperature to let someone know the meal isn't safe, etc. You still have to train the user to know what it is, and when it is safe.
So in this simple example, you have software, hardware, redundancy, and user training that all have to happen. Same for things like cars or planes. You're really trying to build a safety critical system, and many times (such as the Boeing example), it isn't just software or hardware that causes the problems, but issues arise at the intersection of both.
For Boeing, it would be lack of user training, lack of good UX, possibly hardware design issues with being prone to stall, hardware issues with the angle of attack sensors, lack of enough redundancy of the angle of attack sensors to operate properly, etc.
You can never get to a 0% chance of failure. Most of the time you are just attacking the highest chances of failure, since when you get down to the level of faulty parts or mechanical fatigue, things always break.
Of course, each subsystem and integration should have good testing to find all these things, but it's sadly less of a science and more of an art IMHO. And I used to work on rocket software.
Many times, the answers are more simple than you think. Simplicity usually means better operation than trying to overcomplicate error handling. Sometimes you just need to change the whole way you are thinking about the problem.