| Yeah but the first time it was read the key was stored in RocksDB, so the second time it gets consumed after the crash: "If the message already exists in RocksDB, the worker simply will not publish it to the output topic and update the offset of the input partition, acknowledging that it has processed the message." It gets dropped as if it were a duplicate, oops! Edit: I reread the relevant section... "If a message was found in the output topic, but not RocksDB (or vice-versa) the dedupe worker will make the necessary repairs to keep the database and RocksDB in-sync. In essence, we’re using the output topic as both our write-ahead-log, and our end source of truth, with RocksDB checkpointing and verifying it." Not sure entirely what they mean by "(or vice-versa)", if the message exists in RocksDB but not in the output topic how you distinguish between a real duplicate and a crash artifact? If the last message of a crash happens to be a real duplicate and this recovery mechanism reintroduces it into the pipeline then you have a duplicate. Either way, it's not an exactly-once feed. At best (assuming message loss isn't possible) it's an at-least-once feed that usually appears to be exactly-once. |
If the write to Kafka fails, it re-positions itself in the input topic stream based on the offset annotation in the output topic's last message. The write never went to RocksDB, so it won't be considered a duplicate.
Recovering from a failed RocksDB write is more complicated. The output topic's last message will have an offset that will effectively be beyond the accumulated state in RocksDB. Transactionally the last input topic offset for each committed message is written to RocksDB alongside it. The recovery process uses this offset as a starting point when consuming the input topic. During this process, messages aren't published into the output topic until the offset read from the output topic is reached.