Hacker News new | ask | show | jobs
by emfree 3595 days ago
Great question. I wondered the same thing a while ago, and tried to build one using SystemTap (https://github.com/emfree/pystap). Couple reasons why this isn't too easy:

* "Python" in general might mean you're on Linux/Windows/whatever, and it might mean CPython, PyPy, or some other runtime. But any out-of-process instrumentation is gonna have to be pretty platform/runtime specific.

* Even if we restrict ourselves to, say, CPython on Linux, the interpreter's internals aren't super friendly to this sort of inspection from the outside. You have to rely on and also work around implementation details.

Example: to get a Python call stack, you want to look at `PyThreadState_Current` (basically the same idea as `ruby_current_thread` in that excellent linked post of Julia's, I think). But this happens to be null whenever the GIL is released, e.g. when doing network I/O, and then you're kind of out of luck. So you'll already have trouble usefully profiling a single-threaded I/O-intensive program.

* Oh and you pretty much need debug symbols in your CPython binary (I think? Tell me if this isn't true!). Most production CPython builds don't have them. So you have to get the right binary, and rebuild any application dependencies with C extensions. Not hard but annoying.

There is potential though! With some work, we definitely could have a better story for out-of-process Python profiling a la Linux perf.