|
|
|
|
|
by rfoo
596 days ago
|
|
> No idea where you got 10. Because of GIL, there may be at most one thread in a Python process running pure Python code. If I have a computation which takes 10% time in pure Python, 90% time in C extensions, I can only launch at most 10 threads, because 10 * 10% = 100%, and expect mostly linear scalability. > the pure Python code in between calls into C extensions is irrelevant because you would not apply multithreading to it No. There is a very important use case where the entire computation, driven by Python, is embarrassingly parallel and you'd want to parallize that, instead of having internal parallization in each your C extensions call. So the pure Python code in between calls into C extensions MUST BE SCALABLE. C extensions code may not launch thread at all. |
|
> numpy isn't written in Python. However, there is a scalability issue: they can only drive so many threads (not 1, but not many) in a process due to GIL.
Now you have concocted this arbitrary example of why you can't use multithreading that has nothing to do with your original comment or my response.
> instead of having internal parallelization in each your C extensions call ... C extensions code may not launch thread at all.
I don't think you understood my comment - or maybe you don't understand Python multithreading. If a C extension is single threaded but releases the GIL, you can use multithreading to parallelize it in Python. e.g. `ThreadPool(processes=100)` will create 100 threads within the current Python process and it will soak all the CPUs you have -- without additional Python processes. I have done this many times with numpy, numba, vector indexes, etc.
Even for your workload, using multithreading for the GIL-free code in a hierarchical parallelization scheme would be far more efficient than naive multiprocessing.