Hacker News new | ask | show | jobs
by milianw 659 days ago
The premise of this website and articles like https://yosefk.com/blog/how-profilers-lie-the-cases-of-gprof... just show that the authors are using the wrong tools. It is nowadays relatively easy to also look at off-CPU time when profiling with perf (e.g. https://github.com/KDAB/hotspot/?tab=readme-ov-file#off-cpu-...). The idea is to use sampling for the on-CPU periods and then combine that with the off-CPU time measured between context switches. VTune also supported this mode for many years.
2 comments

> The premise of this website and articles like https://yosefk.com/blog/how-profilers-lie-the-cases-of-gprof... just show that the authors are using the wrong tools. It is nowadays relatively easy to also look at off-CPU time when profiling with perf (e.g. https://github.com/KDAB/hotspot/?tab=readme-ov-file#off-cpu-...).

I think, firstly, that spending 15s trying the CTRL-c approach is a worthwhile tradeoff. If you don't find anything, then sure, spend another 30m - 60m setting up perf, KDAB, etc. Maybe more if you're on an embedded device.

Secondly, the author seems to say that he's used this on embedded devices with no output but a serial line for the debugger. This is also a 15s effort[1].

It's basically a very low effort task, takes seconds to determine if it worked or not, and if it doesn't work you've only lost a few seconds.

[1] I'm assuming that if you're developing on a device supporting a serial GDB connection, you've already got the debugger working.

perf is easily available through yocto and buildroot (and probably other embedded linux image builders). hotspot can be downloaded as an appimage. It should not take 30-60min to set this up, but granted, learning the tools the first time always has some cost.

Furthermore, note how your reasoning is quite different from what the website you linked to says - it basically says "there are no good tools" (which is untrue) whereas you are saying "manual GDB sampling might be good enough and is easier to setup than a good tool" (which is certainly true).

the vast majority of embedded cpus cannot run yocto or indeed linux, even the arms

but they all support gdb

True, that's another good point. But again, this reasoning is very different to the one from the linked article and website - if you have oprofile or valgrind's cachegrind available, you clearly could get perf setup instead.

I'm not debating that manual GDB sampling has its place and value. I'm debating that perf is "lying" or that it's impossible to get hold of off-CPU samples, or profiling of multithreaded code in general.

yes, agreed
(well, not all)
kreinin spends a lot of time debugging things that don't run on linux or any cpu architecture linux or vtune supports. even on amd64 linux, perf is not so useful with python, lua, node.js, browser js, shell scripts, etc.