Hacker News new | ask | show | jobs
by zX41ZdbW 1271 days ago
ClickHouse updating materialized views in realtime.

How it is done:

1. Intermediate states of aggregate functions are first-class data types in ClickHouse. You can create a data type for the intermediate state of count(distinct), or quantile, whatever, and it will store a serialized state. It can be inserted into a table, read back, merged or finalized, and processed further.

2. Materialized views are simple triggers on INSERT. It enables incremental aggregation in realtime with materialized views.

Downsides:

Limited area of application. Does not respect multi-table queries if updates are done on more than one table.

Disclaimer: I'm developer of ClickHouse: https://github.com/ClickHouse/ClickHouse

1 comments

How does Clickhouse handle downstream aggregations? E.g...

  SELECT AVG(col) FROM (SELECT MAX(col2) AS col FROM table GROUP BY col3) t;
type query? Essentially, materialized views which are very unstable? Flink has its retract streams where rows can be semantically removed from the output and downstream query plans can understand deletes, but my expectation has been that these have bad worst case performance.
It does not handle deletes / updates of the source table at all - it means the data of the materialized view become unaffected. Nothing sophisticated...

About the multi-level aggregate queries - the only way is to define a materialized view for the inner query and do calculations on top of it on the fly. So, the data for the inner query will be pre-aggregated, but nothing on top of that.