Hacker News new | ask | show | jobs
by jbl0ndie 695 days ago
Isn't that the idea (or perhaps the promise) of languages like R or notebook tools like Jupyter or Collab, which provide a means to ingest, clean, analyse and present your data, then share the code you've used to do that.
2 comments

Notebooks aren't very git-friendly, so in practice you rarely know which version produced the paper.

The fact you can run notebook cells out-of-order exacerbates this problem. Not only do you not know what version the entire file was, you also don't know in what order or how many times each cell within the file executed in order to produce the plots you see in the paper.

This isn't to discount the improvement in UX that you get from notebooks compared to my preferred alternative (emacs with org-mode). Maybe I'm just bitter that the ipynb format exists at all. If notebooks were just a UX layered on top of emacs+org-mode, that would fix most of the core issues.

I like notebooks, they are a useful tool. But they are just a slight adjustment to the programming model and an alternative type IDE. It does not do much in terms of helping reproducibility. Data, software and dependency versioning is much more important. And verification that the code indeed runs on another machine, and produces the correct results. Setting up CI for the project, and basic end2end tests is the minimum level I set for my research (in applied machine learning).