| The bioinformatics workflow managers are designed around the quirkiness of bioinformatics, and they remove a lot of boilerplate. That makes them easier to grok for someone who doesn't have a strong programming background, at the cost of some flexibility. Some features that bridge the gap: 1. Command-line tools are often used in steps of a bioinformatics pipeline. The workflow managers expect this and make them easier to use (e.g. https://github.com/snakemake/snakemake-wrappers). 2. Using file I/O to explicitly construct a DAG is built-in, which seems easier to understand for researchers than constructing DAGs from functions. 3. Built-in support for executing on a cluster through something like SLURM. 4. Running "hacky" shell or R scripts in steps of the pipeline is well-supported. As an aside, it's surprising how often a mis-implemented subprocess.run() or os.system() call causes issues. 5. There's a strong community building open-source bioinformatics pipelines for each workflow manager (e.g. nf-core, warp, snakemake workflows). Airflow – and the other less domain-specific workflow managers – are arguably better for people who have a stronger software engineering basis. For someone who moved wet lab to dry lab and is learning to code on the side, I think the bioinformatics workflow managers lower the barrier to entry. |
As someone who is a software developer in the bioinformatics space (as opposed to the other way around) and have spent over 10 years deep in the weeds of both the bioinformatics workflow engines as well as more standard ones like Airflow - I still would reach for a bioinfx engine for that domain.
But - what I find most exciting is a newer class of workflow tools coming out that appear to bridge the gap, e.g. Dagster. From observation it seems like a case of parallel evolution coming out of the ML/etc world where the research side of the house has similar needs. But either way, I could see this space pulling eyeballs away from the traditional bioinformatics workflow world.