| I would call those options #2 and #3. Option #1 is: 1. Parallelize on a single machine. Most scripting languages have single-threaded bottlenecks (Python, Ruby, node.js, etc.), so this means using multiple processes.
xargs -P goes a long way. The coding changes are essentially a subset of what you need to distribute your program anyway. 32x or 64x speedup is nothing to sneeze at on a modern machine. The difference between 5 minutes and 5 hours usually solves your problem, practically speaking. And this means you don't have to touch every line of code, as you would if you were doing a C++ rewrite. But also don't forget that you can often rewrite 10% of your code in C++ and keep the other 90% in Python, and get a 10x speedup. This requires fairly deep understanding of both your program and of the Python/C interface. It helps to adopt a data flow style so you are not crossing the boundary a lot. And make sure you release the GIL, and consider starting threads in C++ rather than in Python, etc. I've also optimized an R program in C++ and gotten 125x speedup -- and that's single threaded C++; multithreaded would be another 2 orders of magnitude!!! But it also involved fixing a bunch of R performance errors, so don't underestimate that too. In practice, any program which you actually care enough about to rewrite for speed usually has some application-level performance bugs -- i.e. slowness unrelated to the slowness of the underlying language platform. People scale out to many machines because they don't want to rewrite every line of code. But the first step is to "scale out" by using all the cores on a single machine. In some sense, this is the best of both worlds, because you are incurring neither the overhead of distribution (serialization and networking) nor the complexity of distributed error handling. |
In my experience, you're best off optimizing relatively limited pieces of a system rather than big applications. And only consider the rewrite when you've looked at optimizing it in place. This means that the thing that needs to be optimized with a rewrite often has a chance of not having any giant stupid mistakes.
For example see http://bentilly.blogspot.com/2011/02/finding-related-items.h... for a case where I found a thousandfold speed increase by rewriting from SQL to C++. As much as was reasonably possible, I did not change the basic algorithm.