Hacker News new | ask | show | jobs
by o11c 904 days ago
You probably want to defer the division until output, and not do the loop each time - instead, just subtract back N samples when adding the current sample.
2 comments

To be fair, I typically use the MAC in an FPGA to implement this with the coefficient as a fixed point value. As a result the entire step is one cycle, depending on how many MACs are available I will parallelize the algorithim to support all available MAC blocks.

But in C I typically compute the coeficent outside the loop and use it in the multiply sample * (1/coefficient) vs sample/coefficient, even the STM32 microcontrollers have single cycle multiply available.

this depends on how many samples you have and what kind of precision you're working with for the accumulator. as floats get bigger they lose precision.