|
|
|
|
|
by dabacaba
776 days ago
|
|
Sharing Python objects between processes is not safe. Python assumes that access to objects is protected by GIL, but this is not the case when using multiple processes (even if the processes only read from the objects, they still modify the refcounts). Sharing memory buffers can be done safely using mmap. |
|
When you think about it, you already rely on that to access imported modules, global variables, etc in your worker processes. The GIL and reference counting has nothing to care about it either, as they are both copied from the parent process as well, thus you can freely continue to read global state (and even modify, though it will be local to the worker process).
That is, if you have large objects to share with your worker processes, and you don't need to get them back, you can freely just assign them as global variables before starting your multiprocessing pool, and have your workers read them, at zero transfer cost, and safely. This _trick_ has been used since decades in Python.
Now the annoying thing is, that global variable mumbo jumbo is kind of dirty and ad-hoc to maintain. Using custom pickable weakrefs (using something similar to the trick with id() in this repo), you can create proxy weak objects to transfer to your workers, thus achieving the same effect, while keeping a "function argument" interface, instead of proxy8ng through global variables.