Hacker News new | ask | show | jobs
by shib71 5657 days ago
My original thought was shot down when I read this:

  The Pipeline API is not a data or stream processing engine.
1 comments

What I mean by this is that we're not doing the same thing as Cascading (http://www.cascading.org/), which requires you to transform your problem into the tuple-space domain. Stream processing frameworks like Cascading are for green-field implementations that maximize incremental performance.

On the other hand, the Pipeline API is task oriented. Developers use it with a procedural approach. The focus is on parameter and return value passing and scheduling. It's easy to reuse your existing code in this framework. Think of it as something closer to a parallelizable Bash than a data processing framework.