Hacker News new | ask | show | jobs
by samuell 3739 days ago
It's a bit interesting that Cloudera went the opposite way than Spotify and fitted Google's Java API on top of Spark instead (so, changed the backend instead of the "frontend") [1].

[1] https://github.com/cloudera/spark-dataflow

1 comments

Scio author here.

A bit background: Spark and Flink are both frameworks with their own execution engine. Scalding is tightly coupled with Cascading + Hadoop as it's execution engine (also tez WIP). Dataflow Java SDK/Apache BEAM on the other hand is designed to be a simple abstraction with pluggable engines and Cloud Dataflow service is just one of the many runners possible.

Right now there are:

- local runner

- Dataflow runner, fully managed service in GCP

- Spark runner

- Flink runner

Scio wraps Dataflow Java SDK(Apache BEAM) and can potentially leverage any runner available.