|
|
|
|
|
by mmmmpancakes
1061 days ago
|
|
I mean, hopefully why you might need multiprocessing in python is clear? If you have a python task that is highly parallelizable on a single machine with multiple cores, then multiprocessing is probably the right tool to quickly see if you can dramatically speed up your code with parallelism with basically no code overhead or investment in distributed solutions (there are edge cases where it is not, but it takes very little time to test if you are an edge case). I encounter this situation in my data science workflow routinely. It is an easy way to impress product / managers and say "hey, I made this batch algorithm 50x faster, so now it runs in 10 minutes instead of 500." |
|
My experience, after first getting into python, was:
I needed to do something concurrently on one set of data.
Python threading doesn't provide concurrent execution, so my program slowed down when I used threads.
So, I tried multiprocessing. My program slowed down even more, because any communication between processes uses pickle. I was trying to process one dataset in parallel, and pass big chunks back, for a final processing.
So, I saved it to disk, loaded the dataset into each process, multiplying my memory usage by 16x.
I then threw it all out and wrote the performant bits in C++, using swig to automagically make the python interface for it.
So, knowing why (concurrency) isn't necessarily enough.