Hacker News new | ask | show | jobs
by vitus 722 days ago
If y is always a power of 2 (as suggested in the comments), then I'd expect it to turn into an AND of some sort.

And more generally, with older architectures, integer division was much slower than integer multiplication, so compilers would generally transform this into a multiplication plus some shifts [0]. For context in that timeframe, MUL on Sandy Bridge introduces 3-4 cycles worth of latency (depending on the exact variant), compared to DIV introducing 20+ (per Agner Fog's excellent instruction tables [1]). So even computing x - y * (x / y) with the clever math to replace x/y would be much faster than just x%y. (It's somewhat closer today, but integer division is still fairly slow.)

[0] https://news.ycombinator.com/item?id=1131177 (the linked article 404s now, but it's archived: https://web.archive.org/web/20110222015211/https://ridiculou...)

[1] page 220 of https://www.agner.org/optimize/instruction_tables.pdf

1 comments

> So even computing x - y * (x / y) with the clever math to replace x/y would be much faster than just x%y.

That only works when y is constant. Otherwise, you need to work out what to replace x/y with… which ultimately takes longer than just using the DIV instruction.

> That only works when y is constant.

Excellent point! That said, that was the case in this particular example.

> This 16-bit modulo was just a final check that the correct number of bytes or bits were being sent (expecting remainder zero), so the denominator was going to be the same every time.

(Libraries like libdivide allow you to memoize the magic numbers for frequently-used denominators, and if on x86 you have floating point operations with more precision than you need for integer division, you can potentially use the FPU instead of the ALU: https://lemire.me/blog/2017/11/16/fast-exact-integer-divisio...)