|
|
|
|
|
by wyldfire
1521 days ago
|
|
So instead of sampling, or hw perf counters, IPT does tracing? perf counters are able to attribute cycles or cache misses to instructions. but only in aggregate. If your program isn't a 100% computation, cycles consumed won't necessarily point to the bottleneck. So IIUC IPT can tell you more than just the statistics but instead can tell the real story - the sequence of instructions executed? If so, I can see this painting a much clearer picture. But what are the limitations? some small buffers that overflow after a few million instructions or pipeline flushes? If the IPT buffer(s) overflow does magic-trace indicate gaps? Is it possible to combine PMU or sampling with IPT to get multiple profiling dimensions in the same run? Not just what sequence of instructions were executed but where in time-and-code the branch mispredictions, cache misses, etc. occurred? |
|
In fact, if you look carefully at the demo gifs in the README, that trace had 5 decode errors! Nonetheless, it was extremely usable.
Snapshot sizes are configurable--you can go back as far as you like. However, the trace viewer tends to crash when the trace files reach the hundreds of MB and you'll need to do some work to set up a trace processor outside of your browser for the UI to connect to. The UI will offer up some docs if you actually run into this.
I'm so glad you asked us about PMU events, we've been thinking a lot about those. These are available in traces of the efficiency cores of Alder Lake CPUs, but nothing else. When we get our hands on a server class part with PMU tracing we'll add support ASAP. We conjecture that it will be absurdly useful to see cache events on a timeline next to call stacks.