| HN Mirror

In general, whenever you need to perform a join (of multiple datasets), that ends the pipeline of local operations on a partition. Other operators as well that necessarily require data from other partitions end local pipelines. This is why linear scalability is not completely achieved in practice. Most interactions with data cannot be performed in a completely partitionable way.

Usually those other operations which force the local pipeline to end occur in a query plan prior to hitting any kind of tradeoff of doing too much in a local pipeline, since local pipelines are SO much faster than what happens when communication is required.