|
|
|
|
|
by lisper
486 days ago
|
|
Yeah, I have a war story... I was working on mobile robot research at JPL back in the 1990s. We had a robot with an arm attached. It worked fine except that every now and then the whole system would crash hard with a totally corrupted heap and stack, just random data everywhere. So no chance of a backtrace. The really weird thing was that this only happened when the arm was moving. We also had the exact same system running under a different operating system and we never had any problems there, so we were 100% sure it was not a compiler error. It was a compiler error. It took us a year to figure out what was going on. It turned out that the compiler had a bug where it would emit code that would pop the stack pointer and then pull a value out of the now unprotected stack frame. On the non-embedded system this did not cause any problems, but on the embedded system (running vxWorks) hardware interrupts used the same stack as the process that was running when the interrupt hit. So if we happened to get an interrupt just after the stack pointer was popped but before the unprotected value was grabbed, that value would get stomped on by the interrupt handler. Then when the interrupt handler would return, the process would resume, grab the now-random value, and chaos ensued. |
|