| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mips_r4300i 1055 days ago

Good points. Embedded systems deal with many of those. It made me think of a funny story that causes me to pay closer attention to things like this:

Some time ago I shipped a product running an RTOS which unfortunately had a subtle scheduler bug where it would randomly crash periodically. The bug was pretty rare (I thought), only affecting part of the system, and reproducing the bug took several days each time.

In my infinite genius, rather than waste weeks of valuable time up to release, I set up the watchdog timer on the processor to write a crash dump and silently reboot. A user would maybe see a few seconds of delayed input and everything would come back up shortly.

Unfortunately, I had accidentally set the watchdog clock divider the wrong way, resulting in the watchdog not activating for over 17 hours after a hang!

The bug became much more widely noticeable after the product was released, and only by sheer luck, many people never noticed it.

I eventually fixed the scheduler bug in an update, but the useless watchdog configuration was set in stone and not fixable. Taught me to never assume a rare bug would stay rare when many tens of thousands of people use something in the field.