|
> The problem is, once you access such shared objects in Python, it is never readonly access but actually read-write, because it modifies the refcount. That is right, but is a mere drop in the sea. First, because reference counting is not intrusive in CPython (meaning the reference counting structures are outside the PyObject, last I checked), meaning you will mainly copy on write these external small structures anyway.
Second, what I'm describing here is for when pickling objects across workers is prohibitively slow and memory consuming, typically that means sharing pandas dataframes of dozens or hundreds of gigabytes. Some copied refcount pages here and these is really not going to be a culprit. > But also, why does it need to be a global variable? When you fork(), afterwards all the local variables are available to the child process. No need for global variables. Right, but you need some way to access these variables, and once you're in a worker process you simply are in a difference scope. def workerfunc(x):
# I'm a poor worker in an empty scope
def parent():
juicy_variable = ...
with Pool(42) as pool:
result = pool.map(workerfunc, [1, 2, 3])
|
That's wrong. That was never the case.
Recent CPython: https://github.com/python/cpython/blob/6d419db10c84cacbb3862...
CPython 2.0: https://github.com/python/cpython/blob/2a9b0a93091b9ef7350a9...
CPython 0.9.8: https://github.com/python/cpython/blob/dd104400dc551dd4098f3...
Regarding multiprocessing.Pool, that would not work as I said. I was thinking more about a plain fork, like this: