Hacker News new | ask | show | jobs
by ginko 2251 days ago
I'm feeling the same way. I really like the idea of jupyter, but it should be a native application, not something running in a browser. Maybe it's also me not being used to working with notebooks, but I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?

I dabbled a bit with EIN[1], an emacs client for Jupyter, but it didn't work all that well for me. In particular it didn't work well with my dark color scheme and you still needed to run a jupyter server to connect to.

[1] http://millejoh.github.io/emacs-ipython-notebook/

2 comments

There is https://github.com/dzop/emacs-jupyter

    #+BEGIN_SRC jupyter-python :session py :display plain
    import pandas as pd
    data = [[1, 2], [3, 4]]
    pd.DataFrame(data, columns=["Foo", "Bar"])
    #+END_SRC
    
    #+RESULTS:
    :    Foo  Bar
    : 0    1    2
    : 1    3    4
> I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?

I get where you are coming from but this would ruin a lot of my data analysis stuff. These are cases where I have 30 minute queries in the lower cells. I don't want those to fire every time.

What might be a nice addition is the ability to either 1) clear the output of all those cells, or 2) mark those cells as inconsistent.

That being said, there are enough other foot-guns besides out-of-order execution in jupyter notebooks. The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.

>The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.

Definitely. It happens particularly often when you change a variable name (and modify its definition) and forget to update the name for parameters further down in the notebook. Suddenly, without any warning, you are using stale data for your analysis, which can really throw a wrench in things. It would nice if when you changed a cell it erased all the old definitions in that cell.

And I don't think I'm the only one who doesn't have trust in their notebook definitions: I've noticed a trend among pretty much anyone who uses them that after they finish their analysis, they restart the kernel and rerun the entire notebook from start to finish as they have little faith that the results in the notebook are actually derived from the cells currently in the notebook.

This is something that I struggled with as well. Back before Jupyter was a big thing I wrote a system called bein (https://github.com/madhadron/bein) that promoted the individual execution to the primary artifact instead of the source code, since that's usually what data analysis and other computational science work really cares about.

Based on my intervening (looks at git timestamps) decade of thinking, I would probably approach it differently, but I think that key point of execution as the artifact and wanting to trace its provenance instead of wanting to track source code remains correct.