| HN Mirror

Yes, I agree that higher-resolution data is not readily available. LLVM MCA has a timeline view that attempts to visualize the overlapping execution of instructions (https://llvm.org/docs/CommandGuide/llvm-mca.html#timeline-vi...), but this is based on models of how the CPU works (not runtime-collected data), and these models are not perfect.

I also agree that sampling profilers have the same issue: instruction-level views of sampling profiles should be taken with a grain of salt.

My concern is that flame graphs with 1-3ns of resolution are presented as a selling point of the tool, without any mention of the caveats around how this model really breaks down at this time scale. I would like to know more details of how the PT data actually relates to the out-of-order execution. Does a branch's timestamp correspond to when that branch was retired? Do we actually know what the timestamp corresponds to, or is it not well-specified? Are there cases where the timestamp is known to be misleading about the true bottleneck?

I don't know the answers to these questions, but I see a tool like this, I really want more information about the strengths and limitations of the data.