Hacker News new | ask | show | jobs
by nearting 1062 days ago
Cool to see that this is moving along - Jupyter merge conflicts have caused me a huge amount of headache over the years.

My solution has been to switch over to Quarto notebooks (mentioned in the post with Jupytext), but I see the issue around saving cell outputs.

I'm curious why one would specifically want to save cell outputs as is in the Jupyter notebook, rather than archiving that in some other format. Sure, that might require putting a lot of information in one page (e.g., if that output is dependent on many other code cells and their outputs), but that just moves the linkage problem around - you'd have to have some way of indicating that the specific cell output was generated by a specific version of cell code (and the order in which they were run, sometimes multiple times).

3 comments

Strongly agree with this. IIRC, in RMarkdown state is treated as a separate cache stored outside the notebook and loaded as needed. You could use something like dvc or gitlfs to manage those cache files, and since the Markdown file is plain text, use regular git to inspect changes to the notebook implementation.

I feel like Jupyter notebooks are the PDFs of data science. They are super useful for displaying results, but bake that data in a super inconvenient way for doing anything but rendering the data to look nice.

You want to share the outputs so that you can share the results while showing your work. That jupyter also inlines everything, such that even charts are stored in the document, makes that even more necessary.

Though, I can see your point, I think. Why not include a build step that moves from your document to the generated output? My gut there is a large part of why the system got popular is that they worked hard on removing the friction that that would add.

As a comparison and to your point, I've seen people try to build "literate test suites" that were in a notebook, very happy with how the output looked. Only to find later that if they had used some of the more common test frameworks, those already create very nice reports. And moving the report format/creation out of the specification allowed a ton of flexibility.

> You want to share the outputs so that you can share the results while showing your work.

What's wrong with rendering to HTML?

You often share with people who want to play with the inputs or the code, while at the same time you want them to share what your choice of inputs outputted.
right, so what's the issue with sharing a notebook (code) and rendered html (results)?

If someone starts playing with the inputs, they're going to lose the outputs you've created unless you have a saved rendered copy anyways

Depends what you mean by "whats wrong?" Conceptually, absolutely nothing. In practice, many of the folks we are talking about have been bitten by mismatched files already. Why add one more set of files to juggle?
> I'm curious why one would specifically want to save cell outputs as is in the Jupyter notebook

My blind guess is that it improves the readability of the notebook / promotes the literate programming mindset. But just a guess