| Exactly our experience too, from complex machine learning workflows in various aspects of drug discovery. We basically did not really find any of the popular DSL-based bioinformatics pipeline tools (snakemake, bpipe etc) to fit the bill. Nextflow came close, but in fact allows quite some custom code too. What worked for us was to use Spotify's Luigi, which is a python library rather than DSL. The only thing was that we had to develop a flow-based inspired API on top of Luigi's more functional programming based one, in order to make defining dependencies fluent and easy enough to specify for our complex workflows. Our flow-based inspired Luigi API (SciLuigi) for complex workflows, is available at: https://github.com/pharmbio/sciluigi We wrote up a paper on it as well, detailing a lot of the design decisions behind it: http://dx.doi.org/10.1186/s13321-016-0179-6 Then, lately we are working on a pure Go alternative to Luigi/SciLuigi, since we realized that with the flow-based paradigm, we could just as well just rely on the Go channels and go-routines to create an "implicit scheduler" very simply and robustly. This is work in progress, but a lot of example workflows already work well (it has 3 times less LOC than a recent bioinformatics pipeline tool written in python and put into production). Code available at: https://github.com/scipipe/scipipe It is also very much a programming library rather than a DSL. It in fact even implements streaming via named pipes, seemingly allowing somewhat similar operations as dgsh, with a bit more code probably, but with the (seeming) benefit of a bit easier handling of multiple inputs and outputs (via the flow-based progr. ports concept). dgsh looks real interesting for simpler operations where there is one main input and output though - which occur a lot for ad-hoc work in the shell, in our experience. Will have to test it out for sure! |