|
|
|
|
|
by drkrab
46 days ago
|
|
Since the cpu always accesses a full cache line (64 bytes) at a time, you might as well search the entire cache line (it’s practically free once the data is on-cpu). So I’d like to try a ‘binary’ search that tests all the values in the ‘middle cache line’ and then chooses to go left or right if none match. You can do the cache line search as a single 512bit simd instruction. A cache line is 64 bytes (or 32 16-bit integers); such a search might well be almost 32 times faster than simple binary search; at least it’ll do 32x less memory accesses, which will dominate in most realistic programs. |
|
For 4 byte keys and 4 byte child pointers (or indexes in to an array) your inner nodes would have 7 keys, 8 child pointers and 1 next pointer, completely filling a 64 byte cache-line and your tree depth for 1 million entries would go down from ~20 to ~7, the top few levels of which are likely to remain cache resident.
With some thought, it's possible to use SIMD on B-tree nodes to speed up the search within the node, but it's all very data dependent.