|
|
|
|
|
by AstralStorm
2958 days ago
|
|
And neither of the bit twiddling is useful for ARM NEON as bit operations in vector form are very limited... (plus there are pipeline stalls) Also multiply adds can be fused so if you're doing that it cab be faster to just multiply by a different number instead of bit twiddling. |
|
I'm not sure which bit hack you're talking about that's done with a multiply or multiply-add. There's a nice use involving De Bruijin sequences for doing lg2 of a single bit that's very instructive - is that what you meant?