A Scala API for Google Cloud Dataflow

Y	Hacker News new \| ask \| show \| jobs

	A Scala API for Google Cloud Dataflow (github.com)
	96 points by Mullefa 3739 days ago

5 comments

samuell 3739 days ago

It's a bit interesting that Cloudera went the opposite way than Spotify and fitted Google's Java API on top of Spark instead (so, changed the backend instead of the "frontend") [1].

[1] https://github.com/cloudera/spark-dataflow

link

sinisa 3738 days ago

Scio author here.

A bit background: Spark and Flink are both frameworks with their own execution engine. Scalding is tightly coupled with Cascading + Hadoop as it's execution engine (also tez WIP). Dataflow Java SDK/Apache BEAM on the other hand is designed to be a simple abstraction with pluggable engines and Cloud Dataflow service is just one of the many runners possible.

Right now there are:

- local runner

- Dataflow runner, fully managed service in GCP

- Spark runner

- Flink runner

Scio wraps Dataflow Java SDK(Apache BEAM) and can potentially leverage any runner available.

link

lucdurette 3738 days ago

Interesting project, glad to see more and more organisation are using Scala with Data projects.

link

wiradikusuma 3739 days ago

scio is also the name for a portable molecular sensor: https://www.consumerphysics.com/myscio/scio

link

hnbroseph 3738 days ago

this is cool! thanks for mentioning it, i may have to grab myself a unit or two.

link

esses 3738 days ago

this is the best non-thread related thread-related post ever. very cool!!

link

anacleto 3739 days ago

Is this native? Or just a Scala wrapper?

link

samuell 3739 days ago

I would expect it to be as native as Google's own Java API [1], though it is still just the API, not the actual backend.

[1] https://github.com/GoogleCloudPlatform/DataflowJavaSDK

link

sinisa 3738 days ago

Correct it's a thin Scala wrapper with some additional features. Execution is delegated to Dataflow/BEAM.

link

ecesena 3738 days ago

Any plan to port it to Beam?

link

sinisa 3738 days ago

Scio author here. Yes as soon as BEAM finishes bootstrapping.

link