|
|
|
|
|
by higeorge13
485 days ago
|
|
I also agree with most of your comments and conclusions. In our setup, we use app ingestion to send all the denormalised data into Clickhouse using async inserts and Debezium/Kafka/Kafka engine and materialized views to sync a few Postgres tables into Clickhouse. 2 of the replicated tables are in the order of billions of rows, and are used in 20% of the queries (usually directly and less frequently with no more than 1-2 joins). Everything else queries the denormalised tables (usually no joins there, only some dictionary usage). Overall query performance is great, although it would have been even better since we use replacing merge trees and final. The 2 main issues that we are facing are: - we need to periodically cleanup the deleted rows from the replacing merge trees, since the application does lots of upserts and deleted rows just stay there. - there is not much flexibility in the ordering keys of the replicated Postgres tables, unless you enable full replica identity. We took that performance hit (although nothing really noticeable in Postgres side) in order to have some flexibility and better query performance in the replicated tables in Clickhouse. |
|
1. For deleted rows, you can create policies to simplify querying. However, periodic deletions are still necessary. We've been optimizing lightweight deletes/updates to improve performance, which should help with automatic deletions.
2. For the second issue, refreshable materialized views with different order keys than raw tables are an option worth considering. However, having it in real time for tables with billions of rows might not be viable. That said, processing within tens of minutes to a few hours could work. We're tracking that the order key serves a dual role—as both a deduplication key and a skip index—which is the root cause of this issue of enabling REPLICA IDENTITY on Postgres side.
Separately, working on a guide covering best practices for Postgres to ClickHouse data modeling, detailing these concepts further. More on this coming soon!