|
|
|
|
|
by kr7
3404 days ago
|
|
That page has a warning at the top: > IMPORTANT: Useful feedback revealed that some of these measures are seriously flawed. A major update is on the way. Looking over the results, some of the numbers are off. On Intel CPUs, FP multiplication is faster than integer division. Might not be true on ARM CPUs which generally have slower FPUs. On Skylake, for example, 32-bit unsigned integer division has a 26 cycle latency with a throughput of 1 instruction / 6 cycles, while 32/64-bit floating point multiplication has a 4 cycle latency with a throughput of 2 instructions / cycle. Source: http://agner.org/optimize/instruction_tables.pdf |
|
For divisions by a constant value that don't easily decompose into shifts you can fall back to multiplication by a magic constant which is the integer reciprocal. (This is also something compilers do and is what's being explained in the article.)