Hacker News new | ask | show | jobs
by __mharrison__ 459 days ago
I read this and think: "I would love to spend a day with Jane Street and teach them how to use notebooks." So much effort is wasted because of knowledge gaps or systems that don't encourage best practices.

I just taught a course to a client this week helping them with this (and other best practices for Python).

2 comments

I've been back and forth whether notebooks are really "best practice".

Currently leaning towards no, as they are hard to test, compose, and version control (it is possible, but you need bespoke notebook-specific tools, and most just don't).

Why are you trying to test and compose notebooks? Notebooks don't really fit into that lifecycle of work, they're implicitly script-like code.

Any time a notebook needed to be tested, refactoring it into a module and testing that was the better choice.

Notebooks in my opinion are by far the best tool for interactive and exploratory data work. I've been using Jupyter/IPython for 10 years and it takes very little discipline to keep them clean and clear. I've never bothered trying to deploy them in any meaningful way.

The quirks with version control are annoying, that's one reason I've switchex to marimo in the last few months.

I'm intrigued - I didn't listen to the post but wonder what kind of best practices you're talking about.

Do you have this written down anywhere or on video anywhere? I'd love to learn more about what you mean.

I’m intrigued as well.. My experience is notebooks struggle as a format for production code. We encourage people who work heavily in notebooks to use them for exploratory work, but choose other tools when it comes time to ship.

When you are exploring something, experimenting, showing.. it’s great; train-of-thought structure, APIs like Pandas optimised for writing and terseness etc.

But when you have a piece of code that will lose a million dollars a minute if someone ships a bug, and which will be maintained by many engineers over many years, then you really want a format that’s optimised for long-term maintenance, incremental change, testability, and APIs optimised for readers.

I write production code, I also work lots in Jupyter notebooks.

Personally, I think the fact that notebooks are usually easier/funner for me to work with is a big problem. I'm by no means a Clojure expert, but I did do a semi-large project in Clojure a few years ago, and some of the ideas of true REPL-driven development that exist there are things I wish that Python supported.

It's hard to explain without actually learning it for real (and most Python devs mistakenly think Python has REPL-driven development; I sure did before learning Clojure!). But once you get used to being able to interact with your actual source code, and at any point just being able to write new code and immediately print out its value, then with one shortcut make it part of the regular codebase... that just blurs the distinction that exists between Jupyter Notebooks and production code in a way that makes everything much better.

I'd love to hear more about this REPL-driven development. I've heard people bring it up from time to time, but it's clearly very different from the typical "stateless horizontal micro-service" that has become common practice.

What tools are used for "write new code and immediately print out its value, then with one shortcut make it part of the regular codebase" and how does that square with working on a team and getting code reviewed?

My book, Effective Pandas 2, has many of them. There's a few conference talks of mine floating around on YouTube that also mention some.

Writing clean data code is one aspect. Filling in knowledge gaps is another. Covertly teaching software engineering best practices to folks who "aren't programmers" yet sit down and write code in Jupyter all day is another.

I should probably write a blog post or record a short video. (I just taught a week long course on this for a client last week.)