Hacker News new | ask | show | jobs
by phkahler 2780 days ago
>> Except most M0 cores will have 1 or 2 cycle internal flash memory reads, and this has slow external flash.

With 16K cache. I'm not sure how you ensure consistent performance though - make sure your code will all fit in 16K, but is that enough? And what about the first time through?

1 comments

Even on ARM with built in flash you can't ensure that easily. The only way to do so is to copy code to ram and run from there (most of the M0-4 devices I've seen don't have an icache). This is because of the way that flash ends up being read from by them, (I can't remember the correct term, i want to say something like stop-waits) where the processor ends up waiting for an indeterminate time period waiting on the flash memory to read the next page.

16k cache is likely enough to ensure stable performance of any given function and any tight loops you're using but will probably not be enough for the entire program so you'll still have misses that cause slow downs but it'll probably not be terribly noticeable unless you're trying to ensure timing over large functions.

"Wait-states" is the term you're looking for.
STM32 has the ART accelerator for it's internal flash. 180 MHz F4 can run at full speed without any wait states most of the time.