Hacker News new | ask | show | jobs
by jkh1 1050 days ago
How many workflow management systems do we need? Over 300[1]. If that's not reinventing the wheel.

[1] https://github.com/common-workflow-language/common-workflow-...

1 comments

Pretty much everyone underestimates the complexity of workflow scheduling.

This means like 99% of tools have had various random limitations and quirks that make them not applicable to a lot of use cases.

This is something that people using these tools mostly never realize (and so complain about the "reinvention").

The folks I know who tried to implement a pipeline tool are often a bit more aware of the challenges and how hard it is to make something that is really general.

I say this as someone who evaluated a dozen tools, and finally extended an existing tool to fix some limitations (Luigi, with our SciLuigi extension), and finally developing our own tool (SciPipe).

It has gotten better, and a tool like Nextflow is pretty generic these days, although they also might have limitations. For example, before DSL2 we needed re-usable modules, which is why we developed SciPipe, which otherwise has a very similar scheduling mechanism to Nextflow (Dataflow/Flow-based).

Still today, I'm having mixed feelings about using extremely complex tools that are dependent on a single organisation to keep updating. Not being able to easily debug execution and a few other things, which is why we wanted a simple library that we could understand ourselves and run through a debugger. (And it didn't hurt that we could get complete audit logs per output file, which can be very useful both for provenance and debugging, and is not found in almost any other tool.)

Just to give some examples of why someone might still entertain thoughts about developing separate tools.

All in all, the widely used ones like Nextflow (and Snakemake) are great tools. They just aren't optimal for every usecase and situation.