| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by KMag 1880 days ago

It improves reproducibility, consistency, and sharing, but reduces convenience for some operations. It's a trade-off in favor of programming in the large.

If you don't want to recompute dependent nodes, then use new names for your experiments rather than redefining old functions and variables. Yes, in some ways this is less convenient for you, but it's more convenient for people receiving your notebooks, that the notebook is always in a consistent state and reproducible.

Maybe it doesn't work well for your workflow, particularly if you're not sharing notebooks and keeping your notebooks small. On the other hand, if your workflow requires significant amounts of leaving notebooks in an inconsistent state, you may end up saving yourself significant frustration with larger notebooks and losing work due to losing track of your mental tracking of inconsistencies.

Also, if you hit a state that you really don't want to lose, you should probably do a quick git commit. You can always squash commits later if needed.

It might be worth changing your workflow, or it might not.

1 comments

ChrisRackauckas 1880 days ago

I think this is the interesting point though. Many people want to use Jupyter notebooks so that it looks reproducible. Not to make it actually reproducible. God forbid it actually has to be re-ran, it could have different results!

I think that's my main notebook gripe: they make it look like if you run the code you'll get these results, but that's not even close to the case. Many people abuse this. At this point, I pretty much assume anything in a Jupyter notebook isn't reproducible.

link

lungben 1879 days ago

Yes. A Jupyter notebook is only reproducible in my opinion if you can hit "Restart Kernell and execute all cells" and get the same result.

Otherwise, it should never been shared with other people or even contain relevant analysis you may need for yourself later.

But this is not enough - also the library dependencies need to be fixed. Pluto will make this very easy in the near-future: https://github.com/fonsp/Pluto.jl/pull/844

link