|
|
|
|
|
by tristanz
4038 days ago
|
|
Yeah, this is a good way to look at it. But it's also my point. Partitions are about parallelism and don't always fit the data model or domain. While you can reduce the partitions to 1, this is limits parallelism. It's not always easy to design a partitioning scheme that preserves the replication semantics you want. And these semantics vary from having something totally ordered, to having something that you can replay on a very fine-grained level (to replicate to the client for instance.) Most discussions of the advantages of logs really emphasize how amazing a totally ordered log is for replication, but that's not actually what production deployments look like so you still need to think carefully about what happens when writes are being applied to your datastores without a clear order. |
|
This is something that tends to surprise developers early on (myself included, years ago). But plenty of people still use queue solutions like RabbitMQ without thinking it all the way through.
Unfortunately, partitioning introduces a design step that makes it a little harder to make processing generic. With RabbitMQ you just post to an exchange and let queues (ie., consumers) filter on the routing keys; if no queues have been bound, for example, messages don't go anywhere. If you want, or don't want, parallelism, you just run either multiple consumers or just one. With Kafka, you need to decide beforehand, and design the "topology" of your log carefully, not just for the producer, but for each consumer. When producers and consumers are different apps, this starts smelling like a violation of the principle of "separation of concerns".
I rather wish Kafka had a better routing mechanism, actually. I don't see any reason why it couldn't have routing keys, just like RabbitMQ.