Hacker News new | ask | show | jobs
by bpicolo 3383 days ago
Jupyter notebooks are a big piece of solving ML reproducability, it feels like.
1 comments

I see this a lot, but I disagree, at least in their current form. They miss a variety of very key parts for reproducibility (which, to be fair, was not their original goal).

* Dependencies like libraries are not specified anywhere.

* Dependencies on local code are not bundled.

* Dependencies on local data are not bundled.

* Underlying requirements like LLVM (which needs to be specifically 3.9.X for llvmlite in python as I discovered recently).

* Perhaps most dangerously, you can run the code sections out of order, and deleted sections will leave their variables around which can interfere with the run. I've been caught out by this in my own notebooks.

I really like jupyter notebooks, but I think some of the design decisions (correct for some ways of working) actively work against reproducible reports.

There was a recent writeup here:

> we were able to successfully execute only one of the ~25 notebooks that we downloaded.

https://markwoodbridge.com/2017/03/05/jupyter-reproducible-s...

Right, "a part" was important. Looks like the authors of that writeup agree.

> Technologies such as Jupyter and Docker present great opportunities to make digital research more reproducible, and authors who adopt them should be applauded.

I somewhat disagree that it's a big part or even really should be a part of the solution, I'm really not sure that these notebooks are the right approach to making reproducible research. The conclusion there doesn't seem supported by their findings, to me.

I think they solve a different use case well, and forcing them into a workflow they weren't designed for may just result in both less useful workbooks and a poor experience.

Edit - To expand a little, jupyter notebooks are nice to mix code and descriptions, and in essence force people to release a certain amount of their code. But other than that they actually provide fewer of the guarantees that you want from things for reproducibility. And since the goals for reproducibility generally force more restrictions on how you work, I can see there being more issues for trying to match these different ways of working.

I don't see how there are any features which are useful for the goal of making things reproducible, and as such why people keep bringing them up as a solution.

The main steps would seem to be

1. Make sure the results used are not generated on "my machine" but on a specified base run somewhere else. Just like we don't take the unit test results I run locally as gospel.

2. Unique and versioned identifiers for code, base system and data.

3. Archived code and data.

4. An agreed on format in the output data to say where it came from (which references the identifier(s) for the code, base system used and input data)

Your output might be a rendered notebook, but the notebook itself is entirely orthogonal to the process, as what a notebook provides is:

* A nice interface for entering the code

* A nice output format

* A neat way of mixing nicely written documentation along with the code