Hacker News new | ask | show | jobs
by ChuckMcM 904 days ago
To be fair, I typically use the MAC in an FPGA to implement this with the coefficient as a fixed point value. As a result the entire step is one cycle, depending on how many MACs are available I will parallelize the algorithim to support all available MAC blocks.

But in C I typically compute the coeficent outside the loop and use it in the multiply sample * (1/coefficient) vs sample/coefficient, even the STM32 microcontrollers have single cycle multiply available.