Hacker News new | ask | show | jobs
by matthew_stone 1062 days ago
Yes, it's a DSL.

Here's a simple scatter-gather example. Let's say you want to count the number of lines in each file for a list of samples, and report a table of counts collected from each sample. Define a rule to process each input file, and a rule to collect the results.

I find this much less complex than an equivalent bash workflow. Additionally, these rules can be easily containerized, the workflow can be parallelized, and the workflow is robust to interruption and the addition of new samples. Snakemake manages checking for existing files and running rules as necessary to create missing files, logic that is much more finicky to implement by hand in bash.

    with open('data/samples.txt') as slist:
        SAMPLES = [l.strip() for l in slist.readlines()]
    
    rule all:
        input:
            "results/line_counts.txt"
    
    rule count_lines:
        input:
            "data/lines/{sample}.txt"
        output:
            "processed/count_lines/{sample}.txt"
        shell:
            """
            cat {input} |
              wc -l | 
              paste <(echo -e {wildcards.sample}) - > {output}
            """
    
    rule collect_counts:
        input:
            expand("processed/count_lines/{sample}.txt", sample=SAMPLES)
        output:
            "results/line_counts.txt"
        shell:
            """
            cat <(echo -e "sample\tn_lines") {input} > {output}
            """
1 comments

This looks like... an unconstrained amalgamation of YAML, python, and zsh/bash. Knowing all these building blocks quite well, I cannot say I find this attractive at all. Seems like it is bound to incur all the problems that having everything in YAML cursed configuration management and deployment orchestration with in ansible.