The graph at the top shows you which methods took how much time. The wider a bar, the longer it took. So it's immediately obvious that there are two substitutions called from Kernel::Output::HTML::Layout::Output that take up nearly all of the time. That's where you can focus your optimizations on. The other methods only take so much time because they include calls to the offending methods.
You can click on the bars (or the the routines listed below the graph) to get a few of the code, annotated with call counts and time spent on each line.
This is from OTRS, when it still used its home-grown, hacky template system, which is the major offender here. After switching to a proper template system, this is what the same request looks like: http://moritz.faui2k3.org/tmp/otrs-nytprof/agentticketzoom-u...
You can see from the absolute numbers at the top that it's much faster (18.3s -> 2.12s), and there's no single spot anymore where all the time is spent.
It is rather typical that unoptimized code has one or more big time sinks, and profiling makes that obvious.
A common technique for looking at a program (not a database query) is "stack sampling". A timer goes off N times a second, and records the call stack of the process/threads. Then, statistics are gathered from the set of call stacks.
E.g. - "60% of the time is in the sequence main() -> process_unit() -> validate_input() and the things called from there" or "45% of the time is in all of the call sequences which then lead back into write_line()" or things like that. Usually, you see patterns of a small number of slow functions that everything depends upon and/or, arranging the call stacks into a sort of "tree" of calls, a branch of the tree that you spend an inordinate amount of time in.
There are newer tools that provide graphical representation of this data as well (e.g. - show the tree as sort of a topographical map, with hotspots as peaks, and the call tree as geological strata)
Of course, if your program has a giganto routine that it never leaves, you might not learn much -- "do_all_the_work_inline() is executing 95% of the time!", unless you start looking at things at the line number level. Blech.
The graph at the top shows you which methods took how much time. The wider a bar, the longer it took. So it's immediately obvious that there are two substitutions called from Kernel::Output::HTML::Layout::Output that take up nearly all of the time. That's where you can focus your optimizations on. The other methods only take so much time because they include calls to the offending methods.
You can click on the bars (or the the routines listed below the graph) to get a few of the code, annotated with call counts and time spent on each line.
This is from OTRS, when it still used its home-grown, hacky template system, which is the major offender here. After switching to a proper template system, this is what the same request looks like: http://moritz.faui2k3.org/tmp/otrs-nytprof/agentticketzoom-u...
You can see from the absolute numbers at the top that it's much faster (18.3s -> 2.12s), and there's no single spot anymore where all the time is spent.
It is rather typical that unoptimized code has one or more big time sinks, and profiling makes that obvious.