| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adamcharnock 2830 days ago

    Pypeline was designed to solve simple medium 
    data tasks that require concurrency 
    and parallelism but where using frameworks 
    like Spark or Dask feel exaggerated or unnatural.

This is exactly what I was looking for very recently. Thank you for writing this, I'll certainly look into it.

1 comments

marcyb5st 2830 days ago

What about Apache Beam? Getting started with the Python SDK has been very easy IMHO. Also, you are future proof as you can easily switch runner from Local to Dataflow/Flink/...

link

cgarciae 2829 days ago

I use BEAM for my Dataflow jobs. But their local "DirectRunner" is just for testing purposes. As with Spark, BEAM is a huge beast, Pypeline was created with simplicity in mind, its a pure python library, no dependencies.

link

philote 2829 days ago

But not future proof in having to use python 2.7 unfortunately.

link