|
|
|
|
|
by ddorian43
3721 days ago
|
|
How about using a bitlist/bitmap/bitset/bitstring/bitarray, maybe even a compressed one, like roaring-bitsets ? Edit: question: is there any type of index better than compressed bitsets when you have per-column index and you want to AND,OR, basically combine them ? |
|
If your index is denser than ~.1 and sparser than ~.9, a simple bitmap is almost certainly the fastest solution for union and intersections, and likely the most compact as well. SIMD is so simple and fast over these that they are hard to beat.
At higher or lower densities than this, "something else" is probably better. The idea of a compressed bitset (like Roaring) is to use a different encoding depending on the density. So in that sense, a compressed bitset is the best you can do.
That said, there is still lots of room for improvement among compressed bitsets. Among other things, I don't think there are any that yet deal well with "inverted selection" for very high densities. Most papers just avoid the issue by assuming "stop lists", which might not be possible in a real world database. Intersections between different density columns using different encodings still has lots of optimization potential too.