Hacker News new | ask | show | jobs
by tombert 598 days ago
A large part of my job in the last few months has been in the form figuring out how to optimize joins in Kafka Streams.

Kafka Streams, by default, uses either RocksDB or an in-memory system for the join buffer, which is fine but completely devours your RAM, and so I have been writing something more tuned for our work that actually uses Postgres as the state store.

It works, but optimizing JOINs is almost as much of an art as it is a science. Trying to optimize caches and predict stuff so you can minimize the cost of latency ends up being a lot of “guess and check” work, particularly if you want to keep memory usage reasonable.

1 comments

Can you explain why streaming joins are necessary. All examples I've seen are bad. For example joining books and author as a stream seems ridiculous, why couldn't the author come up with a better example that is realistic.
Imagine a system monitoring payment transactions. Each transaction stream (e.g., purchase events) could be joined with customer account data (e.g., past purchasing patterns or blacklist flags). Streaming joins enable flagging potentially fraudulent transactions leveraging live context.