It's a bit interesting that Cloudera went the opposite way than Spotify and fitted Google's Java API on top of Spark instead (so, changed the backend instead of the "frontend") [1].
A bit background:
Spark and Flink are both frameworks with their own execution engine. Scalding is tightly coupled with Cascading + Hadoop as it's execution engine (also tez WIP).
Dataflow Java SDK/Apache BEAM on the other hand is designed to be a simple abstraction with pluggable engines and Cloud Dataflow service is just one of the many runners possible.
Right now there are:
- local runner
- Dataflow runner, fully managed service in GCP
- Spark runner
- Flink runner
Scio wraps Dataflow Java SDK(Apache BEAM) and can potentially leverage any runner available.
[1] https://github.com/cloudera/spark-dataflow