Note that joblib serialization is pickle based and therefore has the same security implications as for any pickle file: consider loading a joblib or pickle file as running a compiled executable: never do it if you do not trust the source.
A new safer alternative for scikit-learn model persistence is skops:
Also note that nowadays, with Python 3.8+ and pickle protocol 5, it's now as efficient to do:
import pickle
with open("model.pkl", mode="wb") as f:
pickle.dump(trained_model, f, protocol=pickle.HIGHEST_PROTOCOL)
with open("model.pkl", mode="rb") as f:
trained_model = pickle.load(f)
pickle from the standard library with protocol 5 can store and load large data buffers often found as attributes of scikit-learn models (typically large numpy arrays) without extra memory copies (as joblib.dump and joblib.load were designed to do with a few hacks that violate the official pickle protocol).
and also provides extra API to handle large data buffers externally ("out-of-band") via custom callbacks. This is in addition to the no-copy semantics memory optimization when loading/storing such arrays "in-band" without providing custom callbacks.
A new safer alternative for scikit-learn model persistence is skops:
- https://skops.readthedocs.io/en/stable/persistence.html
It makes it possible to trust a list of types of Python objects that are safe to load and refuse to load skops files with untrusted types.