|
|
|
|
|
by Svenskunganka
977 days ago
|
|
I'm not the one who wrote the original comment, so I can't modify it. But one should still commit offsets because it is the happy-path; DB transaction successful? Commit offset. If the latter fails due to e.g application crash and you seek at startup to the partition offset stored in the DB + 1, you get exactly-once semantics. There's some more details, e.g you'd have to do the same during consumer group rebalance, and topic configuration also plays a role, for example if the topic is a compacted topic or not, and if you write tombstones, what its retention policy is. edit: You added some more to your comment after I posted this one, so I'll try to cover them as well: > One downside is that you leak internals of other system (partitions). Yeah, sure. > The other is that it implies serialised processing - you can't process anything in parallel as you have single index threshold that defines what has been and what has yet not been processed. It doesn't imply serialised processing. It depends on the use-case, if each record in a topic has to be processed serially, you can't parallelize full-stop; number of partitions equals 1. But if each record can be individually processed you get parallelism equal to the number of partitions the topic has configured.
You also achieve parallelism in the same way if only some records in a topic needs to be processed serially, at which point you can use the same key for the records needing to be serially processed and they will end up in the same partition, for example recording the coordinates of a plane - each plane can be processed in parallel, but an individual plane's coordinates need to be processed serially - just use the planes unique identifier as key and the coordinates for the same plane will be appended to the log of the same partition. |
|
If one-and-only-one semantics are needed and processing should be parallel, other methods have to be used.