|
|
|
|
|
by jeffail
2525 days ago
|
|
"What’s more, we want to ensure the information about which events we’ve seen is written durably so we can recover from a crash, and that we never produce duplicate messages in our output." Your processor is described as writing from Kafka to Kafka and using a persisted RocksDB instance to check message identifiers. How then do you ensure messages aren't dropped if your processor crashes or gets killed after checking against RocksDB but before the message is flushed to the Kafka broker? Also is your producer writing to Kafka not at-least-once? If so then even if it removes all duplicates in its processing stage the feed written to your output topic could still contain duplicates. By contrast deduplicating on consumption avoids that problem entirely by attempting to build an idempotent consumer, which results in an exactly-once. Although in this case they have identified edge cases of duplicates they're comfortable with. |
|