Hacker News new | ask | show | jobs
by adamcharnock 2830 days ago

    Pypeline was designed to solve simple medium 
    data tasks that require concurrency 
    and parallelism but where using frameworks 
    like Spark or Dask feel exaggerated or unnatural.
This is exactly what I was looking for very recently. Thank you for writing this, I'll certainly look into it.
1 comments

What about Apache Beam? Getting started with the Python SDK has been very easy IMHO. Also, you are future proof as you can easily switch runner from Local to Dataflow/Flink/...
I use BEAM for my Dataflow jobs. But their local "DirectRunner" is just for testing purposes. As with Spark, BEAM is a huge beast, Pypeline was created with simplicity in mind, its a pure python library, no dependencies.
But not future proof in having to use python 2.7 unfortunately.