Hacker News new | ask | show | jobs
by tbr1 1516 days ago
I think this answer has several parts:

- I imagine the extra memory bandwidth of newer parts doesn't hurt. The example traces were taken on server-class Ice Lake machines. They just don't overflow for our typical workloads.

- We found the specific IPT configuration matters a lot. Turning off return compression is more liable to result in overflows. We allow varying this in magic-trace via the `-timing-resolution` parameter, more detail available in the wiki. We don't typically see overflows under the default configuration even on Broadwell server-class parts.

- Clark spent a week on an Intel NUC (mobile Tiger Lake part) toiling away on decode error recovery. For the most part, the data lost are uninteresting branches, and you only need one of the call in / return out of a frame to survive the decode error to be able to construct a frame for it.

We also considered the periodic stack sampling approach for error recovery, but ended up not implementing it since the decode error recovery we implemented ended up being robust enough in practice.

We ended up having more trouble with runtimes that mess with the stack pointer directly. (The kernel does this for the retpoline Spectre mitigation! But perf is smart and rewrites that part of the instruction stream into a jump for us.) There's code in magic-trace to special-case OCaml exceptions, for instance, and it's likely similar code is necessary for some other runtimes too (we have an open issue for Go's coroutine switching).