| Author here. For a perfect drop-in replacement of std::lower_bound, the best you can do without breaking anything is to make the search branchless and maybe add some prefetching [1]. Some compilers actually try to implement std::lower_bound this way, but these things can sometimes break from version to version because there are no semantics to reliably make compiler use predication instead of branching. Your intuitions are very much correct: you can get rid of pointers in a B-tree and make it static and implicit and fast, especially if you use SIMD to search for the lower bound within a node [2], but it would technically not be a replacement to std::lower_bound as we need to build an additional structure (even though it's very hard to imagine a scenario where you obtain a sorted array but can’t afford to spend linear time on preprocessing). C++23 has since added std::flat_set, which seems to be an appropriate place to implement it (in the article I compared against std::lower_bound because neither I nor the vast majority of the readers knew what std::flat_set was). You can also add pointers back to support insertion and deletion with a moderate penalty to performance [3], but this dynamic B-tree is also technically not a replacement to std::set because of the extra pointer invalidations when a node merges or splits (even though in most cases you don't need pointer stability). You can fix it by, e.g., storing separate pairs of pointers so that each iterator knows where its key is in the tree and vice versa. That would add some overhead (especially in terms of memory) but make it compliant with the standard and still quite a bit faster and lighter than std::set. The three articles combined are like 50 pages long so for a tl;dr version you might be interested in a talk I did at CppCon [4]. You can also extend the trick for heaps, ropes, segment trees, and other tree-like structures. There is a lot of work to be done here. [1] https://en.algorithmica.org/hpc/data-structures/binary-searc... [2] https://en.algorithmica.org/hpc/data-structures/s-tree/ [3] https://en.algorithmica.org/hpc/data-structures/b-tree/ [4] https://www.youtube.com/watch?v=1RIPMQQRBWk |
> For a perfect drop-in replacement of std::lower_bound, the best you can do without breaking anything is to make the search branchless and maybe add some prefetching [1]. Some compilers actually try to implement std::lower_bound this way, but these things can sometimes break from version to version because there are no semantics to reliably make compiler use predication instead of branching.
What's imperfect about a radix-4 (or higher) search instead of a binary search for replacing std::lower_bound? For a radix-k search, you'll have k-1 fetches in flight before you decide which subarray to recurse on, and your dep chain will be log_2(k) times shorter.