| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by improbable22 2794 days ago

I think GP here is an insightful comment. Reproducing things is indeed important, but re-running code is much too narrow a definition, and possibly distractingly narrow.

Maybe your awful notebook gets the same answer you got the day before on the blackboard. Or the same answer your collaborator got independently, perhaps with different tools. Those might be great checks that you understand what you're doing. Spending time on them might be more valuable for finding errors than spending time on making one approach run without human intervention.

Not to say that there aren't some scientists who would benefit from better engineering. But it's too strong to say that fixing everything that looks wrong to engineer's eyes is automatically a good idea.

1 comments

analog31 2794 days ago

I find that with Jupyter, re-running code does serve one useful purpose, which is to make sure that your result isn't affected by out-of-order execution or a global that you declared and forgot about. That is a real pitfall of Jupyter that has to be explained to beginners.

For my work, reproducing a result may involve collecting more data, because a notebook might be a piece of a bigger puzzle that includes hardware and physical data. This is where scripting is a two edged sword. On the one hand, it's easy to get sloppy in all of the ways that horrify real programmers. On the other hand, scripting an experiment so it runs with little manual intervention means that you can run it several times.