| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ogrisel 1160 days ago

Also note that nowadays, with Python 3.8+ and pickle protocol 5, it's now as efficient to do:

  import pickle

  with open("model.pkl", mode="wb") as f:
      pickle.dump(trained_model, f, protocol=pickle.HIGHEST_PROTOCOL)

  with open("model.pkl", mode="rb") as f:
      trained_model = pickle.load(f)

pickle from the standard library with protocol 5 can store and load large data buffers often found as attributes of scikit-learn models (typically large numpy arrays) without extra memory copies (as joblib.dump and joblib.load were designed to do with a few hacks that violate the official pickle protocol).

1 comments

ogrisel 1160 days ago

For reference pickle protocol 5 was specified and implemented as part of:

- https://peps.python.org/pep-0574/

and also provides extra API to handle large data buffers externally ("out-of-band") via custom callbacks. This is in addition to the no-copy semantics memory optimization when loading/storing such arrays "in-band" without providing custom callbacks.

link