| > I may have missed something but I couldn’t figure out how to get the multi-threaded performance out of Python Multiprocessing. The answer is to use the python multiprocessing module, or to spin up multiple processes behind wsgi or whatever. > Historically I’ve written several services that load up some big datastructure (10s or 100s of GB), then expose an HTTP API on top of it. Use the python multiprocessing module. If you've already written it with the multithreading module, it is a drop in replacement. Your data structure will live in shared memory and can be accessed by all processes concurrently without incurring the wrath of the GIL. Obviously this does not fix the issue of Python just being super slow in general. It just lets you max out all your CPU cores instead of having just one core at 100% all the time. |
This means you need to deal with stuck/dead processes. I’ve used multiprocessing extensively and once you hit a certain amount of usage, even in a pool, you just get hangs and unresponsive processes.
I’ve also written a huge amount of Cython wrapped c++ code which releases the GIL. This never hangs and I can multithread there all I want without issue.