Hacker News new | ask | show | jobs
by jgraettinger1 2029 days ago
This is a solved problem, for a few years now. The basic trick is to publish "pending" messages to the broker which are ACK'd by a later written message, only after the transaction and all it's effects have been committed to stable storage (somewhere). Meanwhile, you also capture consumption state (e.x. offsets) into the same database and transaction within which you're updating the materialization results of a streaming computation.

Here's [1] a nice blog post from the Kafka folks on how they approached it.

Gazette [2] (I'm the primary architect) also solves in with some different trade-offs: a "thicker" client, but with no head-of-line blocking and reduced end-to-end latency.

Estuary Flow [3], built on Gazette, leverages this to provide exactly-once, incremental map/reduce and materializations into arbitrary databases.

[1]: https://www.confluent.io/blog/exactly-once-semantics-are-pos...

[2]: https://gazette.readthedocs.io/en/latest/architecture-exactl...

[3]: https://estuary.readthedocs.io/en/latest/README.html

1 comments

Interesting! I'm going to read into the info you linked. Thanks for the info!