Hacker News new | ask | show | jobs
by Findecanor 1049 days ago
Doesn't the risk of lack of precision apply to both floating point and fixed-point? I'd think the reciprocal would need to be exactly represented in the type, or to have more bits than the original type to produce a correct result in the last bit (or two bits?).

Also, with integers, a signed right shift is rounding down (towards negative infinity), whereas the division operator/instruction in many languages/hardware is rounding towards 0.

To adjust the rounding, you'd add the sign-bit to the first fractional bit before shifting the last step. Let's say that 'x' is a signed long, and a signed long has 64 bits, then:

result = ((x >> amount-1) + ((unsigned long)x >> 63)) >> 1;

1 comments

> Doesn't the risk of lack of precision apply to both floating point and fixed-point? I'd think the reciprocal would need to be exactly represented in the type, or to have more bits than the original type to produce a correct result in the last bit (or two bits?).

Yes, but this was historically okay on an x87 FPU which had more precise representation than the common external formats.