|
|
|
|
|
by boredandroid
4038 days ago
|
|
One thing to realize is that a partitioned log is a generalization of an unpartitioned log (i.e. if you set # partitions = 1 in a partitioned log you have an unpartitioned log). In Kafka the purpose of partitions is to provide computational parallelism not model entities in the world. So if you have 100m users you would map that into a number of partitions based on your computational parallelism (maybe 10-100 machines/processes/threads). In other words you would have a single topic partitioned by user id, not a topic per user. If you have a centralized relational database that maps reasonably well to a single partition log (both in terms of scalability and guarantees). For distributed databases you generally don't have a total order over all operations. What you usually have is (at best) a per partition ordering, which maps well to a partitioned log as well. For applications that record events (logging or whatever) it is natural to think of each application thread or process as a kind of actor with a total order. |
|