|
|
|
|
|
by smacke
1131 days ago
|
|
Personally I don't like the "write to disk" approach; I think it kind of just punts the state problem somewhere else (i.e. from memory to disk). Writing to a database and adding versioning is better, but that's a lot of machinery to expect a notebook user to adopt (though maybe better tooling could help). Also a lot of Python objects are not out-of-the-box pickleable (e.g. generators). Also pickle is a mess. |
|
One thing I’ve toyed with is writing a Jupyter kernel extension that notes what new locals you’ve defined in a cell, figures out what locals are read, and creates a (cached) function from the cell. E.g. a cell that has `y = a @ x + b` becomes
I don’t worry much about serialization - 90% of the time what I need to cache is dataframes (write to parquet), and the rest is trained models (custom serializer). People rarely need to cache generators, in my opinion.