Hacker News new | ask | show | jobs
by UK-Al05 2024 days ago
One way around this is to make sure your kafka command streams are processed in order, in serial partitioned by an id where you want the concurrency control.

Normally you only want concurrency control within certain boundaries.

By figuring out the minimum amount transaction and concurrency boundaries you can inch out quite a bit of performance.

1 comments

Sure, but that defeats the quest for horizontal scalability. You can build highly performant systems based on serial execution, but not sure this is an area where Kafka excels particularly.
That's why you partition by some id. Say stock SKU id for stock control. Then you can handle other SKUs in parallel. It's only in serial for a single SKU. That's probably the maximum performance potential your going to get in a traditional db anyway.
This definitely seems like the "Kafka" way to solve this problem, but I fear there are implications to this partitioning scheme I'd love to see answered. For example, partition counts aren't infinite, and aren't easily adjusted after the fact. So if you choose, say, 10 partitions originally, for a SKU space that is nearly infinite, then in reality you can only handle 10 parallel streams of work. Any SKU that is partitioned behind a bit of slow work is then blocked by that work.

It's doable to repartition to 100 partitions or more, but you basically need to replay the work kept in the log based on 10 partitions onto the new 100 partitions, and that operation gets more expensive over time. Then of course you're basically stuck again once your traffic increases to a high enough level that the original problem returns. If the unit of horizontal scaling is the partition, but the partition count can't be easily changed, consumers eventually lose their horizontal scalability in Kafka, from my perspective.

On the other hand Kafka partitions are relatively cheap on both broker and client side; 100 partitions does not require 100 parallel consumers so over-provisioning is not so risky.
This strikes me as mixing the physical and logical models.
There's logical and physical partitions.

Logical partitions are always handled by the same physical partition. But physical partitions can handle multiple logical partitions.