| HN Mirror

I want to voice my support for keeping the driving use case "experiments".

I'm coming from the perspective of a software engineer here. To a software engineer, a "program" is a collection of stateless routines and behavior. Data is external and separate, the same program should be able to process a wide range of data. "Reproducibility", as much as that matters, is having a tested system that responds in a predictable and reliable way to inputs, and data is one such input.

When I first worked extensively with a scientist on an experiment, I was shocked how much common wisdom from computer science was turned on its head. One is expected to load up a Matlab workspace with data and code all in the same file? Scripts irreversibly mutate data, and often run exactly once? How could one possibly keep track of such an environment? How does one fix bugs in a series of commands typed into an interactive prompt? Reproducibility to a scientist is a log of actions that could be repeated by another human, but the environments used often just dropped such things on the floor, to be caught only by the most diligent researcher with an unusually well-kept notebook.

I think there is definitely a happy medium somewhere. Reproducibility as a scientist understands it; interactivity in a way that makes sense to a scientist writing a one-off script. Program state stored easily so that the scientist doesn't feel lost every time they restart their environment, as I imagine they must do when editing python scripts in vim as a software engineer might. But all this in a world where scripts can be maintained and versioned and fixed without their hair catching fire.