|
|
|
|
|
by tristanz
4038 days ago
|
|
Martin's talks and blog posts are aways awesome. I'm really excited to see how this plays out for real applications. The one thing I'm always somewhat confused by though is how a "totally ordered log" intersects with the reality of a partitioned log. The simplicity of a log seems to break down a bit when you partition. For instance, imagine I want to implement multi-key transactions on top of a distributed datastore. With a totally ordered log this is easy. But with a partitioned log, it becomes much harder. Alternatively, imagine I want to implement a collaborative editing app like Google Docks or something like Slack. A natural design would be to to have millions of independent logs. I can then replay logs to get current state and watch logs to keep it updated. But as far as I'm aware, partitioned logs like Kafka do not actually support millions of topics. So there's no way to replay a log for something like a channel or document. |
|
In Kafka the purpose of partitions is to provide computational parallelism not model entities in the world. So if you have 100m users you would map that into a number of partitions based on your computational parallelism (maybe 10-100 machines/processes/threads). In other words you would have a single topic partitioned by user id, not a topic per user.
If you have a centralized relational database that maps reasonably well to a single partition log (both in terms of scalability and guarantees).
For distributed databases you generally don't have a total order over all operations. What you usually have is (at best) a per partition ordering, which maps well to a partitioned log as well.
For applications that record events (logging or whatever) it is natural to think of each application thread or process as a kind of actor with a total order.