|
|
|
|
|
by xmaayy
1055 days ago
|
|
> 1. For multiples processes use `gunicorn` This will load up multiple processes like you say. OP loads a large dataset and gUnicorn would copy that dataset in each process. I have never figured out shared memory with gUnicorn. |
|
Assuming you're on Linux/BSD/MacOS, sharing read-only memory is easy with Gunicorn (as opposed to actual POSIX shared memory, for which there are multiprocessing wrappers, but they're much harder to use).
To share memory in copy-on-write mode, add a call to load your dataset into something global (i.e. a global or class variable or an lru_cache of a free/class/static method) in gunicorn's "when_ready" config function[1].
This will load your dataset once on server start, before any processes are forked. After processes are forked, they'll gain access to that dataset in copy-on-write mode (this behavior is not specific to python/gunicorn; rather, it's a core behavior of fork(2)). If those processes do need to mutate the dataset, they'll only mutate their copy-on-write copies of it, so their mutations won't be visible to other parallel Gunicorn workers. In other words, if one request in a parallel=2 gunicorn mutates the dataset, a subsequent request has only a 50% likelihood of observing that mutation.
If you do need mutable shared memory, you could either check out databases/caches as other commenters have mentioned (Redislite[2] is a good way to embed Redis as a per-application cache into Python without having to run or configure a separate server at all; you can launch it in gunicorn's "when_ready" as well), or try true shared memory[3][4]
1. https://docs.gunicorn.org/en/stable/settings.html#when-ready 2. https://pypi.org/project/redislite/ 3. https://docs.python.org/3/library/multiprocessing.html#share... 4. https://docs.python.org/3/library/multiprocessing.shared_mem...