|
|
|
|
|
by spiralk
643 days ago
|
|
I dislike how Jupyter notebooks have become normalized. Yes, the interactive execution and visuals are nice for more academic workflows where the priority is quick results over code organization. However, when it comes to sharing code with others for the sake of doing reproducible science, jupyter notebooks cause more trouble than they are worth. Using cell based execution with python is so elegant with '# %%' lines in regular .py files (though it requires using VSCode or fiddling with vim plugins which not all scientists want to do I suppose). No .ipynb is necessary, .py files can be version controlled and shared like normal code while sill retaining the ability to use interactively, cell by cell. Its much easier to organize .py files into a proper python module, and then share and collaborate with others. Instead, groups will collect jumbles of slightly different versions of the same jupyter notebooks that progressively become more complex and less manageable over time. It's not a hypothetical unfortunately, I've seen this happen at major university labs. I'm not blaming anyone because I understand -- the funding is there to do science and not rewrite code to build convenient software libraries. Yet, I can't help but wish jupyter notebooks could be removed from academic workflows. |
|
If the code is the end product, sure, use a python package.
But does your .py with `# %%` in it also store the outputs? If not, why even bring this up? A .py output without the plots tied to the code doesn't meet the basic use case.
If the end product is the plot, I want to see how that plot was generated. And a Jupyter notebook is a much much better artifact than a Python package, unless that Python package hard codes the inputs and execution path like a notebook would.
Over the past 20 years of my career I have run into this divergence of use cases a lot. Software engineers seem to not understand the end goals, how it should be performed, and the learnings of the practitioners that have been generating results for a long time. It's hard to protect data scientists from these inflexible software engineers that see "aha that's code, I know this!" without bothering to understand the actual use case at hand.