Hacker News new | ask | show | jobs
by Sirupsen 1292 days ago
Are you aware of a good write-up on how Clickhouse/other columnar databases do the intersection?
3 comments

ClickHouse uses a single primary key index, which matches the sort order, plus skip indexes, which knock out blocks to scan. Here's a writeup that explains skip indexes.

https://altinity.com/blog/clickhouse-black-magic-skipping-in...

You can also check out the following webinar, which explains how ClickHouse indexes work in general. Here's a link to the discussion of indexes.

https://youtu.be/1TGGCIr6dMY?t=1933

p.s. The blog article is missing some images that WordPress seems to have lost but you'll still get the idea. (Should be fixed shortly.)

Disclaimer: I work for Altinity

Not in particular sorry, most of the good content I've found is on Altinity [1] and Alibaba's technical blogs [2][3]. These tend to be mostly focused on how the data itself is stored and how to use Clickhouse, but don't really dive into the specifics of how query processing is performed.

[1] https://altinity.com/blog/

[2] https://www.alibabacloud.com/blog/clickhouse-kernel-analysis...

[3] https://www.alibabacloud.com/blog/clickhouse-analysis-of-the...

One obvious way is to build a bitmap indexed by row position for each filter. Both the "&" intersect and the final bit count can be rocket fast on modern CPU vector units.