Hacker News new | ask | show | jobs
by UK-Al05 2024 days ago
That's why you partition by some id. Say stock SKU id for stock control. Then you can handle other SKUs in parallel. It's only in serial for a single SKU. That's probably the maximum performance potential your going to get in a traditional db anyway.
2 comments

This definitely seems like the "Kafka" way to solve this problem, but I fear there are implications to this partitioning scheme I'd love to see answered. For example, partition counts aren't infinite, and aren't easily adjusted after the fact. So if you choose, say, 10 partitions originally, for a SKU space that is nearly infinite, then in reality you can only handle 10 parallel streams of work. Any SKU that is partitioned behind a bit of slow work is then blocked by that work.

It's doable to repartition to 100 partitions or more, but you basically need to replay the work kept in the log based on 10 partitions onto the new 100 partitions, and that operation gets more expensive over time. Then of course you're basically stuck again once your traffic increases to a high enough level that the original problem returns. If the unit of horizontal scaling is the partition, but the partition count can't be easily changed, consumers eventually lose their horizontal scalability in Kafka, from my perspective.

On the other hand Kafka partitions are relatively cheap on both broker and client side; 100 partitions does not require 100 parallel consumers so over-provisioning is not so risky.
This strikes me as mixing the physical and logical models.
There's logical and physical partitions.

Logical partitions are always handled by the same physical partition. But physical partitions can handle multiple logical partitions.