| My main gripe with them is that the dependencies between the cells are implicit and the results of each cell are resident in memory instead of a more durable form (like artifacts on disk). The best way to understand why I hate notebooks is to contrast them with my workflow: when I do data science, each step in the process is represented by a Makefile target which lists its products and its dependencies explicitly. All products are represented as concrete objects on disk (json, csv, figures, or serialized representations in some form). If I need to do reporting, its typical for various targets to generate fragments of latex and for the report (a pdf) to explicitly document which fragments of latex and which figures belong in the report. Them a simple `make report.pdf` is enough to generate the final result. If I change something I can explicitly see which pieces need to be rebuilt and how. I also believe that the structure of a notebook, which mixes code and reporting, encourages bad software design practices like copy-pasting - it doesn't naturally encourage refactoring of shared code into libraries or anything like that. Most jupyter notebooks are just a pile of shit, basically. The big problem is that all the cells in a notebook share one, big, mutable, global state. This is wrong. They also don't work well with git, which I view to be the absolute crux of any successful technical project. |
The crux is is that Jupyter was not made for people with your skills, or more accurately, pattern. What you do is not inherently hard or complicated. Certainly not more than actual data science. It's a cultural thing mostly.