|
|
|
|
|
by ogrisel
3232 days ago
|
|
The pickling implementation of joblib has support for memory mapping numpy arrays nested in arbitrary data structures such as pandas dataframes. Save the dataframe in a folder that can be accessed by the gunicorn worker: import joblib
joblib.dump(df, '/folder/shared_data.pkl')
Then in the code run by the flask / gunicorn workers themselves: import joblib
shared_df = joblib.load('/folder/shared_data.pkl', mmap_mode='r')
# use the shared_df as usual (inplace modifications are not
# authorized)
Some pandas function can have issues with read-only buffer though: https://github.com/pandas-dev/pandas/issues/17192 (caused by a currently unsolved bug / limitation of Cython) but it can work for your use case. |
|