Hacker News new | ask | show | jobs
by woodruffw 1213 days ago
PyPI still doesn't have this because no packaging ecosystem does. It's impossible to do in the general case if your packaging schema allows arbitrary code execution, which Python (and Ruby, and NPM, and Cargo, etc.) allow.

The closest thing is pattern/AST matching on the package's source, but trivial obfuscation defeats that. There's also no requirement that a package on PyPI is even uploaded with source (binary wheel-only packages are perfectly acceptable).

4 comments

"no packaging ecosystem does."

This is a little bit too strong, since packaging doesn't require arbitrary code execution. For example, Go doesn't permit arbitrary code execution during `go get`. Now - there have been bugs which permit code execution (like https://github.com/golang/go/issues/22125) but they are treated as security vulnerabilities and bugs.

Of course, you're right about Python.

What I meant by that is that no packaging ecosystem (to my knowledge) runs arbitrary uploaded code to find network activity. Some may do simpler, static analyses, but outright execution for dynamic analysis purposes isn't something I'm aware of any ecosystem doing.

Python, Ruby, et al. are in an even worse position than that baseline, since they have both arbitrary code in the package itself and arbitrary code in the package's definition. But the problem is a universal one!

Ah, yep, you're right about that as far as I know too.
This seems eminently solvable though. Why can’t every package submission cause some minimal sandboxed docker image to install the package and call the various functions and methods and log all network and disk activity? If anything looks suspicious it would be denied and the submitter would have to appeal it, explaining why the submission is valid. The same applies for NPM and Cargo. I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start. This seems like the kind of thing that wouldn’t even cost all that much, and big corporate users of python would stand to benefit.
For one, because Docker is not a sandbox, and containers are not a strong security boundary[1]. What you really need here is a strongly isolated VM, at which point you're playing cat-and-mouse games with your target: their new incentive is to detect your (extremely detectable) VM, and your job is to make the VM look as "normal" as possible without actually making it behave normally (because this would mean getting exploited). That kind of work has a long and frustrating tail, and it's not particularly fruitful (relative to the other things packaging ecosystems can do to improve package security).

> I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start.

You're probably talking about Moyix, who did indeed downloaded every package on PyPI[2], and unintentionally executed a bunch of arbitrary code on his local machine in the process.

[1]: https://cloud.google.com/blog/products/gcp/exploring-contain...

[2]: https://moyix.blogspot.com/2022/09/someones-been-messing-wit...

You make some good points. But it still seems to me that, if you used the best available sandboxed VMs for each platform (Windows Sandbox for Windows; FireJail for Linux; VirtualBox with no folder permissions for OSX-- I don't know if these are the best or even good, those were the ones I found from a bit a searching), that you could install and run these packages in an automated way (especially with some GPT3-type help to figure out how to explore and call the important functions) and look for the telltale signs in the network and file access behavior that they are malicious. Even if we grant that this is a long-tailed "cat and mouse" game, then so what? We won't get 100% security, especially against super sophisticated threat actors, but if you could catch 98% or whatever of the typical clumsy supply chain attacks, or super egregious stuff like that NPM package that deleted your whole disk if you were Russian, that would be an incredibly vast improvement over the current state of affairs. Why isn't that worth doing? Why isn't Google or Microsoft at least trying this?
It isn't worth doing because the equation you've supplied doesn't include the effect of catastrophic failure: dynamic analysis lowers the barrier for exploit to a single hypervisor or VM exploit. Catching 98% of spam packages that affect nobody is worth very little when the 2% you don't catch are the ones that do the real damage.

> Why isn't Google or Microsoft at least trying this?

They are: Google and Microsoft both spend (tens of) millions of dollars on hypervisor and VM isolation research each year. It's a huge field.

> What you really need here is a strongly isolated VM,

Simplify, don't use a VM.

Create an isolated network, hook your sacrificial machine up to it, have it install the package. Remotely kill it (network controlled power switch if needed). The machine's hard drive should be hooked up through a network controlled switch of some type. After the sacrificial machine is powered down, reroute the HD so it is connected to a machine that does forensics.

Now you have a clear "before" and "after" situation setup for analysis.

The sacrificial machine's network activity can be monitored by way of whatever switch/router it uses to connect to the Internet.

This is a VM, but flakier and with more steps! It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about. I’m
Doesn't it solve VM sandbox escape problems though? Actual physical hardware isolation, along with an isolated network. Code can't detect it is running on a VM if there isn't a VM, and it sure can't escape the sandbox if there isn't a sandbox.

> It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about.

I started my software engineering career in testing before VMs were a thing, so large, very large, scale test setups like the one I outlined were common place. I wrote about some of my experiences at https://meanderingthoughts.hashnode.dev/how-microsoft-tested... and the physical hardware setup my team was using to run (millions of!) tests was tiny compared to what other teams in Microsoft did at the time.

Network controlled power and peripherals were exactly how automation was done back in the day. Instead of VM images, you got a bunch of identical(ish) hardware and you wrote fresh images to hard drives to reset your state.

Are VMs more convenient? Sure, but my reply was in context of ensuring malware can't detect it is running in a VM!

Well some calls absolutely should invoke network or disk activity, so you would additionally need to define what constitutes good and bad activity for each. Moreover unless the package is a collection of pure functions it would be easy to hide the malware trigger in state that won't be initialized properly by the automated method calls but would be in the standard usage of the package.
> It's impossible to do in the general case if your packaging schema allows arbitrary code execution

Java's type system: ClassLoaders plus SecurityManager was impossible?

that's literally how Java applets worked, enforced through the type system

https://docstore.mik.ua/orelly/java-ent/security/ch03_01.htm

yes, SecurityManager was a poor implementation for many reasons, but it's definitely not "impossible" to sandbox downloaded code from the network while having it interact with other existing code, you can do it with typing alone

I'm not sure it's not do-able, actually. What about having an execution sandbox and a way to check the calls made during the execution of the install script for instance?

I worked a few years back on something like this but it went nowhere, but I still believe it would be doable and useful. The only trace I found back is https://wiki.python.org/moin/Testing%20Infrastructure, which contains almost no info...