Hacker News new | ask | show | jobs
by zznzz 1219 days ago
> Is DWARF unwinding so slow that that is actually faster?

No. The only reason it works like this is because the upstream Linux kernel has thus far rejected in-kernel dwarf unwinders, but copying the stack is simpler and available / implemented.

2 comments

DWARF bytecode is a full VM. Do compiler writers test their DWARF output? (my experience is not - especially for architectures out of the big 2 or 3) How does the kernel access the ELF file pages with the DWARF information in when in an NMI handler? You could mlock all your debug information when a program loads but the memory overhead wouldn't be nice. It is hard enough getting a build ID.

The elephant in the room btw is LBR call stacks, but they aren't exposed in the kernel/BPF yet. Userland perf has them and they recently became available on AMD.

It is not required to unwind the user space stack in the NMI handler. It can be done later before returning to user space in a context that can handle faults.
Allowing processes to sniff each others stacks has some fairly obvious security issues.
I don’t understand your concern - what about this would involve one process sniffing another process’s memory? The kernel would still be doing the unwinding, just not in the NMI handler.
Wouldn't all your kernel stacks then end up in whatever this handler is? Why not implement your approach and mail it to LKML :-)
Yes, this only works for user space stacks, but that is sufficient since with ORC kernel stacks are solved (IMO) and it avoids all the issues with trying to mlock debuginfo of all processes that you mentioned. The NMI handler would still unwind the kernel stack.

> Why not implement your approach and mail it to LKML :-)

because this would still be an in-kernel dwarf unwinder and I would expect an instant reject, and because I am lazy and/or don’t care enough about this problem or linux to work on it. Even if people could be persuaded, I don’t have the interest or temperance to debate this with LKML.

Why is profiling done in the kernel for userspace stacks?
because this is about PMU based sampling, which involves triggering interrupts at some interval and doing the sampling while handling the interrupt
Other than overhead, what is the advantage as opposed to handling the interrupt in the kernel and then delivering a signal to userspace? After all, isn't this the role of SIGPROF?