|
Thanks for replying! It's definitely a nice write-up. You are right that micro-benchmarks are a bit suspicious, but they are better than nothing. Btw, have a look at https://godbolt.org/z/zMarEnYP5 to see what Clang come up with on her own. #include<stdint.h>
uint64_t div(uint64_t nlz) {
__builtin_assume(nlz <= 63);
return (63 - nlz) / 7;
}
div: # @div
xori a0, a0, 63
andi a1, a0, 255
li a2, 37
mul a1, a1, a2
srli a1, a1, 8
subw a0, a0, a1
slli a0, a0, 56
srli a0, a0, 57
add a0, a0, a1
srli a0, a0, 2
ret
This uses Risc-V assembly. Just for fun. x86 is also fascinating.I haven't analysed it in detail. But it looks like Clang doesn't seem to mind multiplication. |