Hacker News new | ask | show | jobs
by pcordes 3226 days ago
One common trick is to handle the %3 and %5 with down-counters that you reset to 3 or 5 when they reach zero. Or better, unroll the loop by 3 so you only need one down-counter.

(Also, no, `%5` isn't "horrifically" slow when it's a compile-time-constant modulus: you or the compiler can do it with a multiply and shift for the integer division, and then x%5 = x - (x/5 * 5). https://godbolt.org/g/3HwBrF. It still sucks though.)

I wrote an x86 assembly FizzBuzz for fun a while ago, intended to be an example of how to optimize (https://stackoverflow.com/a/37494090/224132). Some parts of it kind of suck, though. I made some parts efficient (like appending "buzz\n" to a buffer with a mov [rdx], r13 / add rdx, 5), but left in too many conditional branches and a function call instead of inlining everything when unrolling.

I'm glad so many people like my post that the OP linked. It was fun to write. :)

1 comments

Indeed, llvm is smart enough not to use a DIV instruction for modulo of a constant.

Turns out you can compute x%3 == 0 with even fewer steps, it's (x*0xaaaaaaaab) < 0x55555556 (assuming wrapping unsigned 32 bit arithmetic).

Nice. It seems gcc/clang don't know that trick. They still divide and then check that x/3 * 3 == x. https://godbolt.org/g/V4kfuu. (So do ICC17 and MSVC CL19).