I found that writing cProfile's profiling data to a file and analysing that with either RunSnakeRun's[1] or KCacheGrind's[2] GUI gave me much more insight than just the tabular display from cProfile.
It gives you a really good visual overview of where your code spends its time and has a really good interface for drilling down into subroutines to pinpoint the exact location of the slow code.
Still not exactly great. I'd like to get a flamegraph view of the profile[0] or a chrome-style time series as it provides a much clearer high-level picture, but I'm not sure cProfile records enough information to do so (or that it stores threading informations in the profile, which would be necessary for time series)
* By default, the ITIMER signals used by the profiler interrupt syscalls. Disable that by adding the following in plop.Collector.__init__() after the call to signal.signal():
signal.siginterrupt(sig, False)
* Try all ITIMER modes, e.g. by changing the default in plop.Collector.__init__()
[1] http://www.vrplumber.com/programming/runsnakerun/ [2] http://kcachegrind.sourceforge.net/html/Home.html