|
|
|
|
|
by oivey
1061 days ago
|
|
You have to manually set up shared memory with its own API that has its own limitations, right? I thought some seamless integration was a new feature, but AFAICT, transfers between multiprocesses still leads to things being pickled and copied. Am I wrong? |
|
Only partially. When you send things to a multiprocessing.Pool/concurrent.futures.ProcessPoolExecutor, they're pickled and copied. "Sending" happens when passing arguments to e.g. "multiprocessing.Pool.apply_async()", "multiprocessing.Queue.put()" or "concurrent.futures.ProcessPoolExecutor.submit()".
However, there are two other ways to share data into your multiprocessing processes:
1. Copy-on-write via fork(2). In this mode, globally-visible data structures in Python that were created before your Pool/ProcessPoolExecutor are made accessible to code in child processes for (nearly) free, with no pickling, and no copying unless they are mutated in the child process. Two caveats here, which I've discussed in other comments on this thread: mutation may occur via garbage collection even if you don't explicitly change fork-shared data in Python[1]; and fork(2) is not used by default in multiprocessing on MacOS or Windows[2].
2. Using explicit shared memory data structures provided by Multiprocessing[3][4]. These do not incur the overhead (in CPU or copied memory) that pickle-based IPC does, but they are not without complexity or cost.
Unfortunately, truly "seamless integration" is not really possible with multiprocessing, so users will have to use one or more of the above strategies according to their application needs.
1. https://news.ycombinator.com/item?id=36940118 2. https://news.ycombinator.com/item?id=36941791 3. https://docs.python.org/3/library/multiprocessing.html#share... 4. https://docs.python.org/3/library/multiprocessing.shared_mem...