Hacker News new | ask | show | jobs
by erikgaas 2251 days ago
Been using this for work projects. A lot of raised eyebrows when people hear jupyter first development, but the automated docs, flexibility with prose, inline testing, out of the box pip packaging, and git integration make it well worth it. A bit of a learning curve, but very rewarding.
3 comments

I've written a lot of jupyter notebooks and honestly, emacs+org-mode is way better. Maybe if jupyter wasn't in a webbrowser, had vim or emacs keybindings, it would be better. Also, I'm not sure the idea of notebooks is even a good idea: it's very easy to get into a inconsistent state and with no textual source of truth it can be very difficult getting back.

Though, thinking about it, the real problem is that when I'm using a real editor (emacs), I feel like a wizard, I know it like the back of my hand and have any number of extensions and libraries I can use. With jupyter, I'm always fighting something and there's no meaningful way to configure it to do what you want. Also, the intellisense sucks. In addition, and maybe this is silly, but I find using a webbrowser to write code to be distasteful.

I'm feeling the same way. I really like the idea of jupyter, but it should be a native application, not something running in a browser. Maybe it's also me not being used to working with notebooks, but I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?

I dabbled a bit with EIN[1], an emacs client for Jupyter, but it didn't work all that well for me. In particular it didn't work well with my dark color scheme and you still needed to run a jupyter server to connect to.

[1] http://millejoh.github.io/emacs-ipython-notebook/

There is https://github.com/dzop/emacs-jupyter

    #+BEGIN_SRC jupyter-python :session py :display plain
    import pandas as pd
    data = [[1, 2], [3, 4]]
    pd.DataFrame(data, columns=["Foo", "Bar"])
    #+END_SRC
    
    #+RESULTS:
    :    Foo  Bar
    : 0    1    2
    : 1    3    4
> I find it strange that I manually have to reevaluate all following cells when I change an earlier one. Shouldn't that just happen automatically?

I get where you are coming from but this would ruin a lot of my data analysis stuff. These are cases where I have 30 minute queries in the lower cells. I don't want those to fire every time.

What might be a nice addition is the ability to either 1) clear the output of all those cells, or 2) mark those cells as inconsistent.

That being said, there are enough other foot-guns besides out-of-order execution in jupyter notebooks. The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.

>The number of times that persisted variables defined in long-deleted cells have masked bugs is more than I'd care to admit.

Definitely. It happens particularly often when you change a variable name (and modify its definition) and forget to update the name for parameters further down in the notebook. Suddenly, without any warning, you are using stale data for your analysis, which can really throw a wrench in things. It would nice if when you changed a cell it erased all the old definitions in that cell.

And I don't think I'm the only one who doesn't have trust in their notebook definitions: I've noticed a trend among pretty much anyone who uses them that after they finish their analysis, they restart the kernel and rerun the entire notebook from start to finish as they have little faith that the results in the notebook are actually derived from the cells currently in the notebook.

This is something that I struggled with as well. Back before Jupyter was a big thing I wrote a system called bein (https://github.com/madhadron/bein) that promoted the individual execution to the primary artifact instead of the source code, since that's usually what data analysis and other computational science work really cares about.

Based on my intervening (looks at git timestamps) decade of thinking, I would probably approach it differently, but I think that key point of execution as the artifact and wanting to trace its provenance instead of wanting to track source code remains correct.

Also been using nbdev for work projects in past month. So far, it's been a great productivity boost.

I really, really like the specific aspect of testing with nbdev.

Your docs/examples are your tests. The notebook is a natural environment for scaffolding mocks and other things, without having to use a testing framework over top of the unittest objects.

I'm a long-time user of IDEs, and always will be, but if your metric for producing code is just lines of code, nbdev isn't for you. However, if your metric is producing documented, tested code that is maintainable, it's definitely, for me and my team, a big productivity boost.

Didn't pydoc already have inline testing?
Sure, but git already had the ability to commit files to it and pip already had the ability to create packages.