| Disclosure (given this is from Confluent): I'm ex MSK (Managed Streaming for Kafka at AWS) and my current company was competing with Confluent before we pivoted. Yup, this is one more example, just like Pulsar. There are definitely great optimizations to be made on the average case. In the case of parallel consumer, if you'd like to keep ordering guarantees, you retain O(n^2) processing time in the worst case. The issues arise when you try to traverse arbitrary dependency topologies in your messages. So you're left with two options: 1. Make damn sure that causal dependencies don't exhibit O(n^2) behavior, which requires formal models to be 100% sure.
2. Give up ordering or make some other nasty tradeoff. At a high level the problem boils down to traversing a DAG in topological order. From computer science theory, we know that this requires a sorted index. And if you're implementing an index on top of Kafka, you might as well embed your data into and consume directly from the index. Of course, this is easier said than done, and that's why no one has cracked this problem yet. We were going to try, but alas we pivoted :) Edit: Topological sort does not required a sorted index (or similar) if you don't care about concurrency. But then you've lost the advantages of your queue. |
Is there another way to state this? It’s very difficult for me to grok.
> DAG
Directed acyclic graph right?