| HN Mirror

What I mean by this is that we're not doing the same thing as Cascading (http://www.cascading.org/), which requires you to transform your problem into the tuple-space domain. Stream processing frameworks like Cascading are for green-field implementations that maximize incremental performance.

On the other hand, the Pipeline API is task oriented. Developers use it with a procedural approach. The focus is on parameter and return value passing and scheduling. It's easy to reuse your existing code in this framework. Think of it as something closer to a parallelizable Bash than a data processing framework.