Hacker News new | ask | show | jobs
by magicalhippo 1386 days ago
Denormalized numbers is one reason why you really want to think carefully if you try to optimize code by rewriting expressions involving multiplication and division.

For example, if you got "x = (a / b) * (c / d)" one might think that rewriting it as "x = (a * c) / (b * d)" will save you a division and gain you speed. It will and it might, respectively.

However it will also potentially break an otherwise safe operation. If the numbers are very small, but still normal, then the product (b * d) might result in a denormalized number, and dividing by it will result in +/- infinity.

However, the code might guarantee that the ratios (a / b) and (c / d) are not too small or too large, so that multiplying them is guaranteed to lead to a useful result.

3 comments

Here is a really cool automatic tool that rewrites floating point expressions to be more accurate: https://herbie.uwplse.org/
Anyway, since there aren't any dependencies between a, b, c, and d, I would expect the two divisions to end up basically in parallel in the pipeline. So the critical path is a division and a multiplication either way. Of course that is just a guess.
That assumes you can do multiple divisions in parallel. Back in the good old days, a single division unit was the norm, and it still is on most microcontrollers (assuming they even have hardware floating-point division[1]).

Anyone have any references on how the current state of affairs on modern AMD/Intels?

[1]: ARM Cortex-M4 for example can have a hardware FPU, but where division and sqrt are optional, see https://developer.arm.com/documentation/102832/latest/

https://en.wikichip.org/wiki/intel/microarchitectures/sunny_...

Looks like one FP divider on modern intel. Though you can pack multiple divisions into an instruction.

For AMD I can find throughput numbers but not how many there are, in a brief search. I'd guess two??

Interesting! Looks like my guess was off -- mea culpa.
It appears that Agner Fog's website is down at the moment, so we must conclude that the universe does not want to share this knowledge.
e*ln(x) = e*(ln(a) + ln(b) - ln(c) - ln(d)) would be nice to extend to zero/negative numbers.