| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cubes 1488 days ago

This looks really neat. One thing I noticed on reading the source code, it appears to actually import the modules:

Quoting the docstring on the `track_module` function:

    """This function executes the tracking of a single module by launching a
    subprocess to execute this module against the target module. The
    implementation of thie tracking resides in the __main__ in order to
    carefully control the import ecosystem.

Source: https://github.com/IBM/import-tracker/blob/67a1e84e5a609e52e...

Here's the actual subprocess call: https://github.com/IBM/import-tracker/blob/67a1e84e5a609e52e...

    # Launch the process
    proc = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, env=env)

I think this is clever, and maybe even necessary, but feels risky to do on unaudited third-party Python libraries.

Maybe I'm misunderstanding something?

3 comments

jreese 1488 days ago

> I think this is clever, and maybe even necessary, but feels risky to do on unaudited third-party Python libraries.

This is why my coworker built the project he called "dowsing"; it tries to understand as much as possible from the setup.py's AST, without actually executing it.

https://github.com/python-packaging/dowsing

link

cubes 1488 days ago

Neat, I'll take a look! I thought I was going to need to write something similar!

link

gabegoodhart 1488 days ago

Hi, I'm the main author of import_tracker. Thanks for taking the time to dig into it! It's a really interesting point that the subproces.Popen could itself be a security concern. The command that's being executed is executing the __main__ of the import_tracker library itself (which is not something that a user can't configure), so is your concern that import_tracker itself is untrusted and might be a concern for users running this on their machines?

For context on why I'm using the suprocess here, this allows the tracking to correctly allocate dependencies that are imported more than once (think my_lib.submod1 and my_lib.submod2 both need tensorflow, but my_lib.submod3 doesn't).

link

cubes 1487 days ago

Hi! I think that, in my cursory reading, I misunderstood what the code is doing. I thought it was importing the module you're trying to analyze... I'll have to read more closely when I have some spare time.

link

gabegoodhart 1487 days ago

Makes sense! I think the commenter below correctly addressed the true security concern here which is importing arbitrary python libraries. As is, import_tracker doesn't attempt to solve this problem (though it's an interesting one to consider for this or a similar library). Please feel free to reach out with any other questions if you're curious.

link

SnowflakeOnIce 1487 days ago

No, you understand. Indeed, by importing Python code, you execute Python code, and so there could be ab execution path for malicious code to run.

FYI, pylint does something similar for native-code extension modules (unless this changed in the past few years): it imports them dynamically!

EDIT: reading the code more closely and reading the rest of the comments, more precisely, it's not the subprocess call itself, but rather importing an arbitrary Python module, which could be a path for code execution. But this is the case generally with Python: importing a module executes code, and so even just importing (not otherwise executing) an untrusted module could be problematic.

link

gabegoodhart 1487 days ago

Yep, this is spot on. As written, import_tracker does indeed do a dynamic import of the library in question and you're right that this introduces the possibility of arbitrary code execution. Currently, import_tracker is designed for library authors where the library in question is a trusted library that has dependency sprawl.

It's a very interesting use case to consider how a similar solution could work as a sandbox for investigating supply chain concerns with third-party libraries that have transitive dependencies. I think some of the static analysis tools referenced in other comments would address this better since the real concern there is detecting the presence of transitive dependencies which may be malicious as opposed to identifying exactly where in the target library those dependencies are used.

link