|
|
|
|
|
by zbentley
1055 days ago
|
|
The situation is a bit more complicated than this. While it's usually not the case that child processes always duplicate parent memory, that does happen on certain platforms (MacOS and Windows) on some Pythons. Additionally, the situation regarding unexpected page dirtying of copy-on-write memory is nuanced as well, which some of the sibling comments allude to. I'll copy the tl;dr from another comment I've made nearby: There are three main ways to share data into your multiprocessing processes: 1. By sending that data to them with IPC/pickling/copying, e.g. via "multiprocessing.Pool.apply_async()", "multiprocessing.Queue.put()" or "concurrent.futures.ProcessPoolExecutor.submit()". 2. Copy-on-write via fork(2). In this mode, globally-visible data structures in Python that were created before your Pool/ProcessPoolExecutor are made accessible to code in child processes for (nearly) free, with no pickling, and no copying unless they are mutated in the child process. Two caveats here, which I've discussed in other comments on this thread: mutation may occur via garbage collection even if you don't explicitly change fork-shared data in Python[1]; and fork(2) is not used by default in multiprocessing on MacOS or Windows[2]. 3. Using explicit shared memory data structures provided by Multiprocessing[3][4]. 1. https://news.ycombinator.com/item?id=36940118
2. https://news.ycombinator.com/item?id=36941791
3. https://docs.python.org/3/library/multiprocessing.html#share...
4. https://docs.python.org/3/library/multiprocessing.shared_mem... |
|