Hacker News new | ask | show | jobs
by sinisa 3743 days ago
Scio author here.

A bit background: Spark and Flink are both frameworks with their own execution engine. Scalding is tightly coupled with Cascading + Hadoop as it's execution engine (also tez WIP). Dataflow Java SDK/Apache BEAM on the other hand is designed to be a simple abstraction with pluggable engines and Cloud Dataflow service is just one of the many runners possible.

Right now there are:

- local runner

- Dataflow runner, fully managed service in GCP

- Spark runner

- Flink runner

Scio wraps Dataflow Java SDK(Apache BEAM) and can potentially leverage any runner available.