|
|
|
|
|
by hashhar
1918 days ago
|
|
At what point does it not make sense anymore to perform all steps of an operation on a single partition of data together?
Is there a point of diminishing returns? I've seen some stream processing systems follow the partition data and apply all transformations at once (e.g. Kafka Streams) while others parallelize the transformations (e.g. Apache Storm IIRC). Also isn't there a tradeoff that in a depth-first (for lack of better term) processing paradigm error-recovery becomes more costly? |
|
Usually those other operations which force the local pipeline to end occur in a query plan prior to hitting any kind of tradeoff of doing too much in a local pipeline, since local pipelines are SO much faster than what happens when communication is required.