Hacker News new | ask | show | jobs
by dmurray 637 days ago
This looks great. I have a use case for something similar: detecting calls to the file system. Lots of code I've inherited has a habit of loading configuration from some random network share, then failing when that config got moved or the production host doesn't have the same access.

I usually use strace(1) to track these down, but it's nowhere near as ergonomic as this tool. I'm wondering now if I could patch the `open` built-in instead.

4 comments

CPython since 3.8 already has built-in audit events, including open, so you don't need to patch anything or use anything external. Just add an audit hook with sys.addaudithook().

Quick example:

    import inspect
    import pathlib
    import sys


    def callsite():
        try:
            pathlib.Path("/tmp/file").open()
        except:
            pass


    def audit_hook(event, args):
        if event == "open":
            path, mode, flags = args
            print(f"audit: open({path!r}, {mode!r}, 0o{flags:o})")
            # Not using traceback here because traceback will attempt to read the
            # source file, causing an infinite recursion of audit events.
            f = inspect.currentframe()
            while f := f.f_back:
                print(
                    f'File "{f.f_code.co_filename}", line {f.f_lineno}, in {f.f_code.co_name}'
                )


    def main():
        sys.addaudithook(audit_hook)
        callsite()


    if __name__ == "__main__":
        main()
Prints:

    audit: open('/tmp/file', 'r', 0o100000000)
    File "/path/to/python/lib/python3.12/pathlib.py", line 1013, in open
    File "/tmp/audit.py", line 10, in callsite
    File "/tmp/audit.py", line 26, in main
    File "/tmp/audit.py", line 30, in <module>
https://docs.python.org/3/library/audit_events.html
Sounds perfect. I didn't know of this, but I think I'll start here.
Went spelunking through the source. I think you absolutely could!

There's actually not a whole lot I found that's really http-library specific. It uses the traceback module in a decorator that ends up being manually wrapped around all of the functions of the specific libraries the author cared about.

https://github.com/cle-b/httpdbg/blob/main/httpdbg/hooks

Should be easy enough to extend this to other libraries.

Super cool tool thanks for sharing @dmurray!

If on Linux or windows you can use Procmon or Instruments on macos.

https://github.com/Sysinternals/ProcMon-for-Linux

You might find the syscall tracing functionality of Cirron useful: https://github.com/s7nfo/Cirron