|
|
|
|
|
by manimino
1371 days ago
|
|
As far as the bitmaps holdup, well. Let's see. The slowest step in ducks right now is combining the results of a multi-index query. Suppose I get 1000 objects matching one attribute and another 1000 objects from another, I've now got to intersect those two sets. Right now I'm using sets, or in some cases, sorted lists that act as sets [1]. But intuitively, a bitwise "and" ought to be faster, due to SIMD etc. It's really just a matter of thinking about how to write it. This video was quite helpful as it mentioned that Postgres uses a similar approach, so probably I'll have a look at their implementation first and take inspiration from there. [1] https://ducks.readthedocs.io/en/latest/how_it_works.html#sor... |
|
I worked on this exact problem today.
This is probably super obvious and I bet you're already doing something similar (or smarter still), but I'm excited about it so I'm telling you anyway.
I was doing the naive O(m log n) algorithm where you fetch one set of items and then test them each against the other index one by one (both implemented as B-trees).
I got an enormous speedup[1] after I realized that you can get a discount on index look-ups by testing a sorted list of multiple items in the same tree traversal operation. Not only does each item tested allow you to reduce the search scope, you can often omit repeating several of the calculations closer to the root.
[1] Eventually 10x, but that's in part due to naively parallelizing the workload. The single threaded approach was still something like 4-5x compared to the original approach. 20k ops/s for thousands of items tested against a 800 Mb index.