Hacker News new | ask | show | jobs
by RagnarD 1207 days ago
Wouldn’t a lookup table be a whole lot easier? I’d think a 64 entry table of integers would be fast to access once cached.
4 comments

It's possible to use a lookup table (sort of) in a register. With x86-64-v2 and later, the code is larger, but potentially faster due to the shorter dependency chain:

int div7 (int x) { return 9 - __builtin_popcountll (0x4081020408102040ULL >> x); }

https://godbolt.org/z/Wj6P6jYGM

I think any monotonic function on 0 .. 63 can be written this way, so should be possible to fold in the outer 63 - nlz expression, too.

This is a very cool technique. Someone ought to benchmark this vs the method described in the original article.
I don't use lookup tables that much when optimizing these days, because most math instructions cost far fewer cycles than a cache miss. And even if your lookup table is in L3, it still costs ~40 cycles to retrieve each line.
That's a good instinct to have. However in this case the input is 0..63 and the output comfortably sits in a single byte. It can fit in just one cache line if you bother to align it.

The miss cost is, therefore, not really relevant: if this code is hotpath at all the cost of a single cache miss is amortized across millions of calls. Your lookup table will be in cache as surely as the code that reads it is.

Maybe easier, if limited to the specific case of an input in the range [0..63] and a divisor of 7, but not necessarily faster. Keep in mind the latency of a memory read is typically around 4 cycles, and at least some architectures have a tighter limit on the number of concurrent memory transactions than the number of concurrent integer math instructions, and keep in mind the article is generalizing beyond the initial problem involving the numbers 63 and 7- if you have more than a couple different divisors, and if you want to allow the input to go as large as possible, then a lookup table quickly becomes wasteful and impractical.
The divisor is from 0 to 63. The dividend is a variable bit length integer, i.e. a bigint.
Absolutely not, he needs to compute (63 - x) / 7, where x is computed from a 64-bit integers. There are no bigints involved.
Well, there are variable length integers involved, since it's as part of a way of representing variable length integers. But it seems you're right that x is just a regular machine int in this context, as an implementation detail. I misunderstood.