Hacker News new | ask | show | jobs
by neuromantik8086 2131 days ago
Just as a quick bit of context here, Konrad Hinsen has a specific agenda that he is trying to push with this challenge. It's not clear from this summary article, but if you look at the original abstract soliciting entries for the challenge (https://www.nature.com/articles/d41586-019-03296-8), it's a bit clearer that Hinsen is using this to challenge the technical merits of Common Workflow Language (https://www.commonwl.org/; currently used in bioinformatics by the Broad Institute via the Cromwell workflow manager).

Hinsen has created his own DSL, Leibniz (https://github.com/khinsen/leibniz ; http://dirac.cnrs-orleans.fr/~hinsen/leibniz-20161124.pdf), which he believes is a better alternative to Common Workflow Language. This reproducibility challenge is in support of this agenda in particular, which is worth keeping in mind; it is not an unbiased thought experiment.

2 comments

Konrad Hinsen is an expert in molecular bioinformatics and also has significantly contributed to Numerical Python, for example, and has extensively published around the topic of reproducible science and algorithms - see his blog.

The fact that he might favor different solutions from you does not mean that he is pushing some kind of hidden agenda.

If you think that Common Workflow Language is a better solution, you are free to explain in a blog why you think this.

Are you saying that the reproductive challenge poses a difficulty to Common Workflow Language? If this is so, would that not rather support Hinsen's point - without implying that what he suggests is already a perfect solution?

I never said that Konrad Hinsen's agenda was hidden; in fact, it's not at all hidden (which is why I linked the abstract). It's just that this context isn't at all clear in the Nature write-up, and it's relevant to take into account.

I haven't taken the time to seriously contemplate the merits of CWL vs Leibniz, although my gut instinct is that we don't really need another domain-specific language for science given the profusion of such languages that already exist (Mathematica, Maple, R, MATLAB, etc). That's the extent of my bias, but again, it's a gut instinct and not a comprehensive well-reasoned argument against Leibniz.

I never answered your last question so here goes:

> Are you saying that the reproductive challenge poses a difficulty to Common Workflow Language?

I don't actually understand how the reproducibility challenge undermines the validity of using CWL / flow-based programming as an approach to promoting reproducible analyses. There certainly wasn't anything in the article that made me think that CWL was challenged, but Hinsen explicitly called out CWL in the abstract, which implies that for some reason he thinks, a priori, that it's a non-solution. He never justifies this implied assumption further, and as near as I can tell, none of the attempted replications used a flow-based language.

If Hinsen really aimed to argue against the viability of CWL/flow-based programming as an approach to reproducibility, he would have done a systematic comparison of historical analyses that used a flow-based system (like National Instruments' Labview or Prograph) vs analyses that are more similar to the approach that he seems to favor (i.e., analyses using Mathematica or Maple).

While I find the challenge interesting to follow, and the retrocomputing geek in me finds it fun, I don't actually understand what it really accomplished other than being a fun diversion. Assuming that an analysis was written in a Turing-complete language and you didn't use non-deterministic algorithms, you should theoretically be able to reproduce the results exactly on modern hardware, and using non-deterministic algorithms I would imagine that a result would be "close enough" within some kind of confidence interval. You may need to go to great lengths (in terms of emulating instruction sets, ripping tapes, etc), but I think a visit to any retrocomputing festival or computer history museum would have made that pretty obvious from the outset.

There seem to be some misunderstanding here.

CWL is intended for stringing together other programs. It is useful for reproducibility in that it attempts to provide a fairly specific description of the runtime environment needed to execute a program, and also abstracts site-specific details such as file system layout or batch system in use. CWL platforms such as Arvados also generate comprehensive provenance traces which are vital for going back and reviewing how a data result was produced.

Leibniz seems to be a numerical computing language for describing equations, which is more similar to something like NumPy or R. It seems like an apples-and-oranges comparison.

The original call-out is weird, because CWL did not exist 10 years ago so you can't yet answer the question yet of whether it facilitates running 10 year old workflows.