|
Fast integer log2(): static unsigned int mylog2 (unsigned int val) {
unsigned int ret = -1;
while (val != 0) {
val >>= 1;
ret++;
}
return ret;
}
Faster log2() (thanks to [1]): static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
unsigned int v; // 32-bit word to find the log of
unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
Fastest log2() on x86: #include <stdint.h>
static inline uint32_t log2(const uint32_t x) {
uint32_t y;
asm ( "\tbsr %1, %0\n"
: "=r"(y)
: "r" (x)
);
return y;
}
On ARM, one can replace bsr with clz.You can figure out that a logarithm in base 2 is easy to compute in base 2 arithmetic (sounds kinda obvious, too); it's the position of the most significant bit set. Now someone needs to come and point out that BSR has a latency 16 times that of AND, and 4 times the throughput. Touche, gentlemen. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerL... |