| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rbranson 2527 days ago

The order-of-operations is that it checks RocksDB first, writes to Kafka, and then writes to RocksDB.

If the write to Kafka fails, it re-positions itself in the input topic stream based on the offset annotation in the output topic's last message. The write never went to RocksDB, so it won't be considered a duplicate.

Recovering from a failed RocksDB write is more complicated. The output topic's last message will have an offset that will effectively be beyond the accumulated state in RocksDB. Transactionally the last input topic offset for each committed message is written to RocksDB alongside it. The recovery process uses this offset as a starting point when consuming the input topic. During this process, messages aren't published into the output topic until the offset read from the output topic is reached.

1 comments

jeffail 2527 days ago

That makes more sense, thanks for clarifying.

Still, assuming there are no other edge cases there, it doesn't address the other problem where a hypothetical consumer of the output topic is reading an at-least-once feed of your exactly-once topic. In order for that not to be the case then the consumer must also be idempotent, in which case what value was gained from the original deduplication?

link