Of course I expect to see the call tree, and to flip between that and a flat view, to flip between different metrics in the views, and to see them across processes/threads. I'm talking about graphical tools (perhaps coupled with external reduction of the data) like CUBE [1], Paraprof [2], and those in toolsets like Open|Speedshop [3] and HPCToolkit [4] with which I'm less familiar.