Hacker News new | ask | show | jobs
by andy_wrote 4124 days ago
Are there people who have more experience with comparative workflow managers who can quickly see the pros and cons of Pinball vs. Luigi? Perhaps someone at Pinterest who tried out other systems, as was mentioned in the post? (Though maybe Luigi wasn't available to the public when this comparison happened.)
2 comments

Luigi was not available in public, when Pinball starts. So not sure the pros and cons between Pinball and Luigi.

When we build pinball, we aim to build a scalable and flexible workflow manager to satisfy the the following requirements (I just name a few here).

1. easy system upgrade - when we fix bug or adding new features, there should be no interruption for current running workflow and jobs. 2. easy add/test workflow - end user can easily add new jobs and workflows into pinball system, without affecting other running jobs and workflows. 3. extensibility - a workflow manager should be easy to extended. As the company and business grows, there will be a lot new requirements and features needed. And also we love your contributions as well. 4. flexible workflow scheduling policy, easy failure handling. 5. We provide rich UI for you to easily manage your workflows - auto retry failed job, - you can retry failed job, can skip some job, can select a subset of jobs of a workflow to run (all from UI) - you can easily access all the running history of your job, and also get the stderr, stdout logs of your jobs - you can also explore the topology of your workflow, and also support easy search. 6. Pinball is very generic can support different kind platform, you can use different hadoop clusters,e.g., quoble cluster, emr cluster. You can write different kind of jobs, e.g., hadoop streaming, cascading, hive, pig, spark, python ...

There are a lot interesting things built in Pinball, and you probably want to have a try!

We are heavy users of Luigi in my company. Its central scheduler process is also UI and sometimes UI stuck for us.

Luigi though has a lot of pipeline building blocks - it provides api to access HDFS, S3, write/read from it etc. They are very useful, but they are executed in the same Python process as the rest of Job - which heavily loads the machine where Job is executed (in our case - same server where luigid scheduler runs).

I'm excited about Pinball architecture. I'd try to use Pinball as scheduler to execute existing Luigi task classes instances on multiple servers.