|
|
|
|
|
by jsjsbdkj
1960 days ago
|
|
This is the most basic pattern for distributed joins - you hash on the join key in both tables and shuffle data based on hash ranges. In some systems like Redshift you can designate the key for distribution so that "related" records are already co-located on a single shard. > our data would already be sorted for the given keys (because within a partition Kafka has sorting guarantees) It's been a while since I used Kafka but I don't remember "sorting guarantees". Consumers see events "in order" based on when they were produced, because each partition is a queue. |
|