Hacker News new | ask | show | jobs
by zbentley 1061 days ago
> Am I wrong?

Only partially. When you send things to a multiprocessing.Pool/concurrent.futures.ProcessPoolExecutor, they're pickled and copied. "Sending" happens when passing arguments to e.g. "multiprocessing.Pool.apply_async()", "multiprocessing.Queue.put()" or "concurrent.futures.ProcessPoolExecutor.submit()".

However, there are two other ways to share data into your multiprocessing processes:

1. Copy-on-write via fork(2). In this mode, globally-visible data structures in Python that were created before your Pool/ProcessPoolExecutor are made accessible to code in child processes for (nearly) free, with no pickling, and no copying unless they are mutated in the child process. Two caveats here, which I've discussed in other comments on this thread: mutation may occur via garbage collection even if you don't explicitly change fork-shared data in Python[1]; and fork(2) is not used by default in multiprocessing on MacOS or Windows[2].

2. Using explicit shared memory data structures provided by Multiprocessing[3][4]. These do not incur the overhead (in CPU or copied memory) that pickle-based IPC does, but they are not without complexity or cost.

Unfortunately, truly "seamless integration" is not really possible with multiprocessing, so users will have to use one or more of the above strategies according to their application needs.

1. https://news.ycombinator.com/item?id=36940118 2. https://news.ycombinator.com/item?id=36941791 3. https://docs.python.org/3/library/multiprocessing.html#share... 4. https://docs.python.org/3/library/multiprocessing.shared_mem...