| We've used PeerDB's hosted offering for sync'ing data from Postgres to Clickhouse both pre and post acquisition by Clickhouse Inc. We've also helped test the integrated sync features in Clickhouse Cloud built on top of PeerDB. We're using it to power customer facing features within our product. It works well. Their team is great. I feel a bit spoiled having had as much access to the engineering team during the private beta as we've experienced. It's great for use cases where it makes sense to sync postgres tables across to clickhouse without denormalizing them. PeerDB can transform rows in a single table sent via CDC using a lua scripting language, but it can't (yet!) denormalize data into clickhouse that is stored in 3NF on Postgres across multiple tables. On the clickhouse query side, we end up wanting denormalized data for query performance and to avoid JOINs. It's frequently not a great idea to query in clickhouse using the same table structure as you're using in your transactional db. In our experience we sync a few tables with PeerDB but mostly end up using app-level custom code to sync denormalized data into Clickhouse for our core use-cases. Most of the PeerDB sync'd tables end up as Clickhouse Dictionaries which we then use in our queries. PeerDB works well and I like it for what it is. Just don't expect to be satisfied with querying in Clickhouse against the same table structure as you've got in Postgres unless your data size is tiny. Curious to know about how others are using it and the architectures you've developed. |
Overall, what you shared makes sense for use cases like yours. However, there are other scenarios—such as multi-tenant SaaS analytics running large-scale workloads with PeerDB/PostgreSQL CDC. In these cases there are 100s of tables across different schemas that are synced using CDC. Some customers denormalize tables using materialized views (MVs), which is a powerful feature in ClickHouse, while others power dashboards directly with JOINs using the recent JOIN improvements in ClickHouse and suitable/optimized order keys (tenant_id,id).
When dealing with 100s to 1000s of tables and a heavily relational schema, building dual-write pipelines with denormalization becomes extremely difficult—especially when the workload involves UPDATEs.
We have many customers falling in the above bucket, replicating multiple petabytes of data to ClickHouse. A few customer deep dives on this are coming soon! :)
Side note: We are tracking support for in-transit transformations as a future feature. However, MVs are the way to go—more of an ELT approach.