|
|
|
|
|
by pcordes
3226 days ago
|
|
One common trick is to handle the %3 and %5 with down-counters that you reset to 3 or 5 when they reach zero. Or better, unroll the loop by 3 so you only need one down-counter. (Also, no, `%5` isn't "horrifically" slow when it's a compile-time-constant modulus: you or the compiler can do it with a multiply and shift for the integer division, and then x%5 = x - (x/5 * 5). https://godbolt.org/g/3HwBrF. It still sucks though.) I wrote an x86 assembly FizzBuzz for fun a while ago, intended to be an example of how to optimize (https://stackoverflow.com/a/37494090/224132). Some parts of it kind of suck, though. I made some parts efficient (like appending "buzz\n" to a buffer with a mov [rdx], r13 / add rdx, 5), but left in too many conditional branches and a function call instead of inlining everything when unrolling. I'm glad so many people like my post that the OP linked. It was fun to write. :) |
|
Turns out you can compute x%3 == 0 with even fewer steps, it's (x*0xaaaaaaaab) < 0x55555556 (assuming wrapping unsigned 32 bit arithmetic).