|
|
|
|
|
by magv
2130 days ago
|
|
An interesting concern is that there often is no single piece
of code that has produced the results of a given paper. Often it is a mixture of different (and evolving) versions of
different scripts and programs, with manual steps in between.
Often one starts the calculation with one version of the code,
identifies edge cases where it is slow or inaccurate, develops
it further while the calculations are running, does the next
step (or re-does a previous one) with the new version, possibly
modifying intermediate results manually to fit the structure of
the new code, and so on -- the process it interactive, and not
trivially repeatable. So the set of code one has at the end is not the code the results
were obtained with: it is just the code with the latest edge case
fixed. Is it able to reproduce the parts of the results that were
obtained before it was written? One hopes so, but given that
advanced research may take months of computer time and machines
with high memory/disk/CPU/GPU/network speed requirements only
available in a given lab -- it is not at all easy to verify. |
|
The kind of interaction you're describing should be frowned upon. It requires the audience to trust the manual data edits are no different than rerunning the analysis. But the researcher should just rerun the analysis.
Also, mixing old and new results is a common problem in manually updated papers. It can be avoided by using reproducible research tools like R Markdown.