|
|
|
|
|
by albertzeyer
775 days ago
|
|
The problem is, once you access such shared objects in Python, it is never readonly access but actually read-write, because it modifies the refcount. The problem is also described here: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multip... But also, you say you would prefer such a unbound memory access hack instead of using a global variable? But also, why does it need to be a global variable? When you fork(), afterwards all the local variables are available to the child process. No need for global variables. |
|
That is right, but is a mere drop in the sea. First, because reference counting is not intrusive in CPython (meaning the reference counting structures are outside the PyObject, last I checked), meaning you will mainly copy on write these external small structures anyway. Second, what I'm describing here is for when pickling objects across workers is prohibitively slow and memory consuming, typically that means sharing pandas dataframes of dozens or hundreds of gigabytes. Some copied refcount pages here and these is really not going to be a culprit.
> But also, why does it need to be a global variable? When you fork(), afterwards all the local variables are available to the child process. No need for global variables.
Right, but you need some way to access these variables, and once you're in a worker process you simply are in a difference scope.