Hacker News new | ask | show | jobs
by magv 2130 days ago
An interesting concern is that there often is no single piece of code that has produced the results of a given paper.

Often it is a mixture of different (and evolving) versions of different scripts and programs, with manual steps in between. Often one starts the calculation with one version of the code, identifies edge cases where it is slow or inaccurate, develops it further while the calculations are running, does the next step (or re-does a previous one) with the new version, possibly modifying intermediate results manually to fit the structure of the new code, and so on -- the process it interactive, and not trivially repeatable.

So the set of code one has at the end is not the code the results were obtained with: it is just the code with the latest edge case fixed. Is it able to reproduce the parts of the results that were obtained before it was written? One hopes so, but given that advanced research may take months of computer time and machines with high memory/disk/CPU/GPU/network speed requirements only available in a given lab -- it is not at all easy to verify.

2 comments

>the process it interactive, and not trivially repeatable.

The kind of interaction you're describing should be frowned upon. It requires the audience to trust the manual data edits are no different than rerunning the analysis. But the researcher should just rerun the analysis.

Also, mixing old and new results is a common problem in manually updated papers. It can be avoided by using reproducible research tools like R Markdown.

If it can't be trivially repeated, then you should publish what you have with an explanation of how you got it. Saying that "the researcher should just rerun the analysis" is not taking into account the fact that this could be very expensive and that you can learn a lot from observations that come from messy systems. Science is about more than just perfect experiments.
And any such "research" should go in the bin. Reproducibility of final results a me d their review is key.
No, you should publish this research and be clear with how it all worked out and someone will reproduce it in their own way.

Reproducibility isn't usually about having a button to press that magically gives you the researchers' results. It's also not always a set of perfect instructions. More often it is a documentation of what happen and what was observed as the researcher's believe is important to the understanding of the research questions. Sometimes we don't know what's important to document so we try to document as much as possible. This isn't always practical and sometimes it is obviously unnecessary.