|
|
|
|
|
by mirekrusin
977 days ago
|
|
You should drop "(...) and carefully acknowledging them when you are sure data is safely stored in your db (...)" part then, because it means it's not necessary, you don't rely on it. One-or-more semantics + local deduplication gives one-and-only semantics. In this case you're optimising local deduplication with strictly monotonic index. One downside is that you leak internals of other system (partitions). The other is that it implies serialised processing - you can't process anything in parallel as you have single index threshold that defines what has been and what has yet not been processed. |
|
edit: You added some more to your comment after I posted this one, so I'll try to cover them as well:
> One downside is that you leak internals of other system (partitions).
Yeah, sure.
> The other is that it implies serialised processing - you can't process anything in parallel as you have single index threshold that defines what has been and what has yet not been processed.
It doesn't imply serialised processing. It depends on the use-case, if each record in a topic has to be processed serially, you can't parallelize full-stop; number of partitions equals 1. But if each record can be individually processed you get parallelism equal to the number of partitions the topic has configured. You also achieve parallelism in the same way if only some records in a topic needs to be processed serially, at which point you can use the same key for the records needing to be serially processed and they will end up in the same partition, for example recording the coordinates of a plane - each plane can be processed in parallel, but an individual plane's coordinates need to be processed serially - just use the planes unique identifier as key and the coordinates for the same plane will be appended to the log of the same partition.