| About 15 years ago I was debugging an ARM7 memory corruption issue on an embedded target. Chip was running at 40 MHz but the instructions were ARM 32 bit instructions, but the external data bus was only 8 bits wide -- reading instructions from external NOR flash, required 4 bus cycles per instruction. So an effective rate of ~10 MHz. We were good about doing code reviews, stacks weren't overflowing, etc. So it was puzzling. Finally, just like the article said, I figured the only way to find it was to catch it "red handed", in the act. The good news is that memory locations getting corrupted were always the same. Long story short, I set up a FIQ [1] -- some of you the FIQ -- which would check the location each interrup. I forget if it checked "for" a value or that it "wasn't" an expected value, ugh, sorry... If the FIQ detected corruption, it did a while (1) that would trigger a breakpoint in the emulator. Then I'd be able to look at the task ID -- we were running Micrium u/C OS-II as I recall -- the call stack, etc. Originally I set up a timer at 1 MHz to trigger the FIQ, but the overhead of going in & out of the ISR 1 million times per second, at essentially a 10 MHz rate, brought the processor to its knees. So I slowed the timer interrupt down to 100 kHz (!!), which still soaked up a lot of the CPU slack that we'd been running with. And time after time I'd hit the breakpoint in the FIQ, but the damage had been done usecs earlier and the breadcrumbs didn't finger a victim. Then it happened. Remember, the hardware timer is running completely asynchronously with respect to the application. Finally, the FIQ timer ISR had interrupted some task's code in exactly the function, at exactly the place (maybe a couple instructions later) where the corruption had occurred. Took about a day start to finish, I'd never seen or heard of using a high speed timer to try to "catch memory corruption in the act", but as they say, necessity is mother of invention. And to non-embedded developers, this is an embedded CPU. No MMU or MPU, etc. just a flat, wild-west open memory map. Read or write whatever you want. Literally every part of the code was suspect. Good times. [1] On ARM 7/9, maybe 11, I think also Cortex R -- the Fast Interrupt Request, or FIQ, uses banked registers and doesn't stack anything on entry -- so it's the lowest-latency, lowest overhead ISR you can have. But you can only have one FIQ I believe, so you have to use it judiciously. |