|
|
|
|
|
by the8472
1061 days ago
|
|
Loading 100GB into RAM and then calling fork() is just painting a giant OOM Killer target on your back. It'll work until something breaks the CoWs or the parent gets restarted while some forks still linger or other fun things like that. Threads make it transparent to the OS that this memory really must be shared between compute tasks. |
|
If you do have memory issues, calling 'gc.freeze()' right before creating your multiprocessing.Pool/Process/concurrent.futures.ProcessPoolExecutor is sufficient to mitigate refcount-related page dirtying in the vast majority of cases. In the small remaining minority of cases, 'gc.disable()' as suggested by the freeze docs[1] may help. If that still doesn't do it, or if your page-dirtying is due to actual mutations of data (not just refcounts), it may be time to reach for actual shared memory instead[2][3].
1. https://docs.python.org/3/library/gc.html#gc.freeze 2. https://docs.python.org/3/library/multiprocessing.html#share... 3. https://docs.python.org/3/library/multiprocessing.shared_mem...