I've never understood why flame graphs are better than the normal presentation of inclusive and exclusive timings in performance tools, even if they're not "modern", but embody some decades' experience. Anyone care to explain?
Of course I expect to see the call tree, and to flip between that and a flat view, to flip between different metrics in the views, and to see them across processes/threads. I'm talking about graphical tools (perhaps coupled with external reduction of the data) like CUBE [1], Paraprof [2], and those in toolsets like Open|Speedshop [3] and HPCToolkit [4] with which I'm less familiar.
I'm far from a performance expert, but my impression is:
It shows the call paths to the functions and what part each path took, that's not so obvious from the typical table. On the other hand, finding functions that are called quite a lot all over the place and add up is easier in the table, so it's not become useless.