Hacker News new | ask | show | jobs
by SloopJon 2356 days ago
I'd say the materialized view is the main thing:

> Thus, we add the following materialized view ... At the end we should have 1440 times fewer rows in the aggregate than the source table.

The cost of populating that view is amortized over the 17.5 hours it took to load the data.

2 comments

Mat views are great as the article showed. I use them to get query response down to milliseconds, as they vastly reduce the amount of data ClickHouse must scan.

That said, there are a lot of other tools: column storage, vectorwise query, efficient compression including column codecs, and skip indexes to name a few. If you only have a few billion rows it's still possible to get sub-second query results using brute force scans.

Disclaimer: I work for Altinity, who wrote this article.

p.s. Loading the view is low-cost compared to loading the source data. On the NUC it's 40-60 minutes, so worst case it's something like 1h / 17.5h = 5.71%. Also, you can still query the source data. That is fast for individual sensors as the examples showed.
Yeah I was confused, where I couldn't tell what was precomputed stats (col min/max/count), view calcs, and what's actual perf -- even legacy SQL vendors do all those. That's apples/oranges, more of a statement against the other db vs for clickhouse. Likewise, the db comparison I'd like to see if _other_columnar_stores_.

I know some folks running one of the larger clickhouse instances out there... but this article made me trust the community less, not more.