Not the OP, but I've written assembly in our codebase for:
* First stage initialization
* Interrupt prologue/epilogue
* Bitbanging where you want deterministic cycle counts
* For using the FIQ for very high priority interrupts. It has it's own registers partially banked, so if you stay in r8-r13 you don't have to save and restore state at all.
So we don't use assembly for magic go faster juice, but instead when there's a coding constraint that we can't easily explain to the compiler.
I'm curious to hear more how/why you use FIQ? From googling it looks like it's an ARM feature. How can/does that work with a general purpose OS or are you using bare metal ARM?
We've got our own internal RTOS, not a general purpose OS.
We use it for different things depending on the board, but my favorite is a really sweet profiler that's integrated with our watchdog logic that'll give us ~1000 PC+SP samples in the 100msec or so before our watchdog timer resets our board. That's pretty invaluable for debugging.
* First stage initialization
* Interrupt prologue/epilogue
* Bitbanging where you want deterministic cycle counts
* For using the FIQ for very high priority interrupts. It has it's own registers partially banked, so if you stay in r8-r13 you don't have to save and restore state at all.
So we don't use assembly for magic go faster juice, but instead when there's a coding constraint that we can't easily explain to the compiler.