Hacker News new | ask | show | jobs
by gnufx 2412 days ago
I've asked before without luck: How are flamegraphs preferable to the well-established sorts of visualizations in the common HPC performance tools, like CUBE, Paraver, and TAU, say? They typically provide at least inclusive or exclusive function/region views with choices of metrics for profiling and/or tracing over serial, threaded, or distributed execution.
1 comments

Well, I'll start by saying I'm not familiar with any of those tools. Took a quick look at them though. It looks like Paraver offers a time domain look at performance? And cube seems to offer time based and a graphviz of the call tree.

In a flame graph the width of a stack frame correlates to the % of CPU time spent in that stack frame, and the y is the particular call stack.

This means that you can quickly tell what functions, and from what call sites, are the most expensive.

The only visualization I know of that matches the ability to quickly zero in on things, while maintaining context, is a graph of call stack with frames colored by cumulative CPU time, but that has the issue that laying out the graph is hard, and seeing everything at once is difficult.

That may be OK in simple cases where you can easily eyeball it, if you're only interested in aggregated CPU time as a metric, and if you win most from optimizing the obvious function in all modes of the program. That's not necessarily the case in complex scientific codes, for instance, especially parallel ones.
This is true. It's more useful for optimizing usage than it is for deep sleuthing of "why is this particular thing performing poorly".