Hacker News new | ask | show | jobs
by hamandcheese 604 days ago
It seems intuitive to me that a correct streaming join is impossible without an infinite buffer and strong guarantees on how events are ordered. The number of real world systems offering both of those guarantees is zero. Anyone espousing streaming joins as a general solution should be avoided at all costs, particularly if they have a title that contains "architect" or "enterprise" (god forbid both in the same title).

At best, it is a trick to be applied in very specific circumstances.

1 comments

A streaming join indeed requires an unbounded buffer in the most general case when inputs keep growing and any input record on one side of the join can match any record on the other side. However, it does not require inputs to be ordered. An incremental query engine such as Feldera or Materialize can handle out-of-order data and offer strong consistency guarantees (disclaimer: I am a developer of Feldera). In practice, unbounded buffers can often be avoided as well. This may require a specialized join such as as-of join (https://www.feldera.com/blog/asof-join) and some GC machinery.