| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sarwarbproton 485 days ago

Disclaimer: I work for Timeplus in the field team.

This is exactly the kind of problem we've been solving with a few of our customers. With Timeplus, we can listen to Kafka and then do streaming joins to create denormalized records to send downstream to ClickHouse. Traditionally we did this with stream processing and this would build up really large join state in memory for when cardinality on the join keys would get very large (think 100s of millions of keys).

This has recently been improved with two additional enhancements: 1. You can setup the join states to use hybrid memory/disk based hash tables in Timeplus Enterprise if you still want to keep the join happening locally (assume all data in the join is still coming in via Kafka) and maintaining high throughput

2. Alternatively, where you have slow changing data on the right hand side(s), we can use a Kafka topic on the left hand side and do direct lookups against MySQL/Postgres/etc on each change on the LHS. This takes a hit throughput but may be ok for say 100s of records per second per join. There's an additional caching capability with TTL here to allow for the most frequently accessed reference data to be kept locally so that future joins are faster.

On additional benefit from using Timeplus to send data downstream to ClickHouse is being able to batch appropriately so that it is not emitting lots of small writes to ClickHouse.