Doesn't the risk of lack of precision apply to both floating point and fixed-point?
I'd think the reciprocal would need to be exactly represented in the type, or to have more bits than the original type to produce a correct result in the last bit (or two bits?).
Also, with integers, a signed right shift is rounding down (towards negative infinity), whereas the division operator/instruction in many languages/hardware is rounding towards 0.
To adjust the rounding, you'd add the sign-bit to the first fractional bit before shifting the last step. Let's say that 'x' is a signed long, and a signed long has 64 bits, then:
> Doesn't the risk of lack of precision apply to both floating point and fixed-point? I'd think the reciprocal would need to be exactly represented in the type, or to have more bits than the original type to produce a correct result in the last bit (or two bits?).
Yes, but this was historically okay on an x87 FPU which had more precise representation than the common external formats.
Also, with integers, a signed right shift is rounding down (towards negative infinity), whereas the division operator/instruction in many languages/hardware is rounding towards 0.
To adjust the rounding, you'd add the sign-bit to the first fractional bit before shifting the last step. Let's say that 'x' is a signed long, and a signed long has 64 bits, then:
result = ((x >> amount-1) + ((unsigned long)x >> 63)) >> 1;