|
|
|
|
|
by Sesse__
489 days ago
|
|
Intel PT is indeed useful (although very, very slow compared to regular sampling profiling), but there's hardly any CPUs that actually implement PTWRITE. (IIRC there's some obscure Xeon or something?) Typically you get a cycle count every six branches, give or take. |
|
https://yosefk.com/blog/profiling-in-production-with-functio...
https://danluu.com/perf-tracing/
Regarding the slowdown - magic-trace reports 2-10% slowdowns which IMO is actually fine even for production (unless this adds up to a huge dollar cost, for most people it won't) since in return for this you are actually capable to debug the rare slowdowns which are the worst part of your user experience.
However, the hardware feature that I propose (https://yosefk.com/blog/profiling-in-production-with-functio...) would likely have lower overhead since it relies on software issuing tracing instructions, eg at each function entry & exit (rather than any control flow change), and it could be variously selective (eg exclude short functions without loops; and/or you could configure the hardware to ignore short calls. BTW maybe you can with Intel Performance Trace, too, I'm just not really familiar with it.)