Hacker News new | ask | show | jobs
by marcyb5st 2827 days ago
What about Apache Beam? Getting started with the Python SDK has been very easy IMHO. Also, you are future proof as you can easily switch runner from Local to Dataflow/Flink/...
2 comments

I use BEAM for my Dataflow jobs. But their local "DirectRunner" is just for testing purposes. As with Spark, BEAM is a huge beast, Pypeline was created with simplicity in mind, its a pure python library, no dependencies.
But not future proof in having to use python 2.7 unfortunately.