| This paper missed the state of the art!!! This concerns unsigned division by a 32 bit constant divisor. Compilers routinely optimize this to a high-multiply by a "magic number" but this number may be up to 33 bits (worst-case, divisor dependent, 7 is a problem child). So you may need a 32x33-bit high multiply. What compilers do today is some bit tricks to perform a 32x33 bit high multiply through a 32x32 multiply and then handling the last bit specially, through additions and bitshifts. That's the "GM Method" in the paper; the juice to be squeezed out is the extra stuff to handle the 33rd bit. What the paper observes is that 32x33 high multiply can be performed via a 64x64 high multiply and then the extra stuff goes away. Well yes, of course. But amazingly in ALL of these worst case situations you can compute a different magic number, and perform a (saturating) increment of the dividend at runtime, and ONLY need a 32 bit high multiply. That is, these are the two algorithms for unsigned division by constants where the magic number overflows: - This paper: Promote to twice the bit width, followed by high multiply (32x64 => 64) and then bitshift - SOTA: Saturating increment of the dividend, then high multiply (32x32 => 32) and then bitshift Probably the paper's approach is a little better on wide multipliers, but they missed this well-known technique published 15 years ago (and before that, I merely rediscovered it): https://ridiculousfish.com/blog/posts/labor-of-division-epis... |