Hacker News new | ask | show | jobs
by patrickthebold 699 days ago
>The configuration update triggered a logic error that resulted in an operating system crash.

> We understand how this issue occurred and we are doing a thorough root cause analysis to determine how this logic flaw occurred.

There's always going to be flaws in the logic of the code, the trick is to not have single errors be so catastrophic.

1 comments

Yeah “how this logic flaw occurred” is the wrong question.

How a common bug was rolled out globally with no controls, testing, or rollback strategy is the right question

They're all good questions. The thing that reads the config should have been fuzz tested with something like AFL. Likely should have a lot more tests. Maybe shouldn't run in a device driver. There's almost no doubt there are engineering process and culture issues here.

And then absolutely the release process.

Rollback is hard I guess once your OS can't boot.

> Rollback is hard I guess once your OS can't boot.

This is why the client needs have enough error handling to realise it's latest update has now caused unsuccessful boot and roll that update back locally to the last known good configuration (or completely back to factory and pull all updates again).