Hacker News new | ask | show | jobs
by Sesse__ 489 days ago
Intel PT is indeed useful (although very, very slow compared to regular sampling profiling), but there's hardly any CPUs that actually implement PTWRITE. (IIRC there's some obscure Xeon or something?)

Typically you get a cycle count every six branches, give or take.

1 comments

Sampling profilers are indeed very low-overhead, however they can't help debug tail latency, for which tracing profilers are indispensable:

https://yosefk.com/blog/profiling-in-production-with-functio...

https://danluu.com/perf-tracing/

Regarding the slowdown - magic-trace reports 2-10% slowdowns which IMO is actually fine even for production (unless this adds up to a huge dollar cost, for most people it won't) since in return for this you are actually capable to debug the rare slowdowns which are the worst part of your user experience.

However, the hardware feature that I propose (https://yosefk.com/blog/profiling-in-production-with-functio...) would likely have lower overhead since it relies on software issuing tracing instructions, eg at each function entry & exit (rather than any control flow change), and it could be variously selective (eg exclude short functions without loops; and/or you could configure the hardware to ignore short calls. BTW maybe you can with Intel Performance Trace, too, I'm just not really familiar with it.)