Hacker News new | ask | show | jobs
by Galanwe 775 days ago
The `multiprocessing` and `concurrent.futures` pools support a fork strategy on Linux (which is the default).

What that means is that your worker processes run in an interpreter whose state is the same as the parent. Effectively you will then inherit imported modules, global variables, etc.

The mechanisms to send objects across threads do rely on pickling objects in a queue, yes. But if you know you will use a forking pool, you can just put your variables as globals before the fork and access them in your workers.

Essentially, you can change this:

    def worker(obj):
        print(obj)

    def parent():
        objs = [obj1, obj2]
        with Pool(2) as pool:
            pool.map(worker, objs) # each obj is pickled and sent to the worker through a queue
To this:

    OBJS = []

    def worker(idx):
        print(OBJS[idx])

    def parent():
        OBJS.append(obj1)
        OBJS.append(obj2)
        with Pool(2) as pool:
            pool.map(worker, range(len(OBJS)))
        del OBJS[0]
        del OBJS[1]
> there are special things like Manager I haven't used.

Managers are an abstraction over a shared memory, and objects that can be safely stored in it. From my experience it's cumbersome to use and very limited in functionality.

1 comments

I tried this on a Linux machine in Py3 earlier, which I think is equivalent to your example:

  from multiprocessing import Pool

  shared = {}

  def f(x):
    global shared  # idk if this matters but tried without it too
    shared[x] = x
    print(shared)

  if __name__ == '__main__':
    with Pool(5) as p:
      p.map(f, [1, 2, 3])
    print(shared)
It printed {1: 1}, {2: 2}, {3: 3} then {} at the end. So the parent and other children didn't see the change made by each child. Doesn't seem like they're RW sharing.
Yes, as mentioned, the workers get a copy of the interpreter states. Every modification made in a worker to these objects stay local to the worker. These tricks are for sending data to a worker, not receiving back data.