| HN Mirror

I definitely agree (and I think given what you work on you would be horrified by how I define cached functions that capture locals), but I think in practice getting to a state where you can restart your kernel often makes it easier to reason about state. But you’re definitely right, it would be better to reason correctly here.

One thing I’ve toyed with is writing a Jupyter kernel extension that notes what new locals you’ve defined in a cell, figures out what locals are read, and creates a (cached) function from the cell. E.g. a cell that has `y = a @ x + b` becomes

    @cache_to_disk
    def compute_y(a, x, b):
        return a @ x + b
    y = compute_y(a, x, b)

I don’t worry much about serialization - 90% of the time what I need to cache is dataframes (write to parquet), and the rest is trained models (custom serializer). People rarely need to cache generators, in my opinion.