Hacker News new | ask | show | jobs
by crabbone 51 days ago
> Looking back ten years to `left-pad`, are there more successful attacks now than ever?

I can't vouch for the number of attacks, but, and since we are talking about Python, nothing substantially changed since the time of `left-pad`. The same bad things that enabled supply chain attacks in Python ten years ago are in place today. However, it looks like there are more projects and they are more interconnected than before, so, it's likely that there are either more supply chain attacks, or that they are more damaging, or both.

Here's my anecdotal experience with Python's packaging tools. For a while, I was maintaining a package to parse libconfuse configuration language. It started as a Python 2.7 project, but at the time there was already some version of Python 3 available, so, it was written in a way that was supposed to be future-proof.

I didn't need to change the code of the project in the last ten or so years, but roughly once a year something would break in the setup.py. Usually, because PyPA decided to remove a thing that didn't bother anyone.

When Python 3.13 came out, as clockwork, setup.py broke. I rolled up my sleeves and removed the dependency on setuptools, instead, I wrote some Python code that generated a wheel from the project's sources. I didn't look up the specification of the RECORD file in dist-info directory, and assumed that sha256().hexdigest() will generate the checksums in the desired format. And that's how I shipped my packages...

Some time later, the company added an AI reviewer to the company's repo and it discovered that instead of hexdigest() the checksums have to be base64-encoded and then padding removed...

Now, to the punchline: nobody cared. The incorrectly generated packages installed perfectly fine without warnings. Nobody checks the checksums.

More so: nobody checks that during `pip install` or the more fancy `uv pip install` the packages aren't built locally (i.e. nobody cares that package installation will result in arbitrary code execution). It's not just common, it's almost universal to run `pip install` on production machines as a means of deploying a Python program. How do I know this? -- The company I work for ships its Python client as a... source package. Not intentionally. We are just lazy. But nobody cares.

5 comments

> It's not just common, it's almost universal to run `pip install` on production machines as a means of deploying a Python program.

Maybe a Python culture problem; maybe a hallmark of Python's status as an "easy to hire for", manager-friendly, least common denominator blub language; maybe a risk that stems from the conveniences of interpreter languages... but this is such a shame in this day and age.

It's seriously not difficult to do better. And if this is what you're doing, you're also missing out on reproducible environments both in dev and in prod. At least autogenerate a Nix package! You still don't need to publish any artifacts, but you can at least have the thing build in a sandbox or yeet the whole closure over SSH.

It's also not that hard to get a Docker image out of a Python project.

You only need one platform-minded person on the whole development team to make this happen.

What is going on???

"Almost universal" is a bit of a stretch, most of the time these days Python apps are deployed as Docker containers, and if you're using k8s this becomes effectively mandatory.

However a lot of the time especially for older codebases the docker build will just run pip install from public pypi without a proper lockfile.

So at least install code isn't being executed on your production machine, but still significant surface area for supply chain attacks

Well, the install code can leave some code behind that will be executed on the production machine... It doesn't really help being in a container. While a separate problem from Python ecosystem, people really put a lot more faith in isolation offered by containers than they should. Also, it's often very tempting to poke holes in that isolation because it's difficult and up to impossible sometimes to get things done otherwise.
As scary as it is right now, it warms your heart a little bit that this system existed for 30 years and is only now reaching a crisis point.

I ran an open source project with tens of thousands of downloads (presumably all either developer machines or webservers, so even a small number is valuable) and never received a malicious pull request, offer of a bribe to install malware, or a phishing attempt with enough effort to even catch my attention.

What it says to me is that there weren't a lot of people working on the crime side of this. It's like dropping your wallet in a bar bathroom and coming back to find it still there.

It's probably the same people, who think that merely having a requirements.txt stating packages with versions or even without that (2010 sends its regards) is fine. Open a random open source Python project on GitHub, and chances are you will see this kind of thing. Stands to reason, that people in companies are not acting much different.
left-pad was an npm issue
I'm referring to it as an anchor on the timeline. Not saying it's a Python issue.
virtualenv isn't relocatable out of the box, so how else would you deploy a python project?

You can call it laziness, but it's not like the python ecosystem has ever developed an answer for this problem. The only reasonable answer has been to use docker, which is basically admitting that the python community did nothing.

Oh, that's a sore spot with me, but I'm glad you asked!

So, for the purpose of full disclosure, I have a personal and professional grudge with PyPA, which also touches on how pip is being managed, beside other packaging issues. It's not the side you want to be on, so, be warned!

So, without further ado: I write my own code to generate the deployed artifact. In my case, I take all the wheels installed in my environment, extract them, and merge them into a single wheel. The process also usually involves removing a bunch of junk from the packages packaged in such a way. You'd be surprised how much nonsense people put in their distributed packages... like, their unit tests, or documentation in HTML / PDF format, __pycache__ files (together with the sources)... the list goes on.

But, it works because I curate what's being installed. I don't trust pip to install just or everything I need. I run it in a separate environment, where I examine the packages that have been installed as dependencies, figure out why any of these packages were installed (you'd be surprised how often you don't need them!), then, I make a list of the dependencies I actually need, with the exact versions and checksums, and use a Python or a Shell script to download and install them in the actual development environment.

This isn't a good idea when you have many short-lived projects, but, in my case, the typical project lifespan is measured in decades and there aren't that many of them. So, I can expend the extra effort required to do that.

Unfortunately, I don't think there's a way to automate the process. The key point is that there's a human who sifts through dependencies and figures out what to do with them. Partially automate, maybe... but I can't think of a way to make this into a program that I could give someone.

> virtualenv isn't relocatable out of the box, so how else would you deploy a python project?

My team has a handful of Python projects. Here's how they work:

devenv.nix provides a Python runtime and all native dependencies, git hooks for linters and things like this. It integrates with direnv and the Python package manager (currently Poetry 1.x for older projects and uv for newer ones) so that when you cd in you get a virtualenv with everything you need, scripts in the project (or stubs for them) magically appear on your PATH so you don't need to use `uv run` or whatever it is for anything.

flake.nix provides a publishable artifact for projects that we run on workstations or servers. It autogenerates a Nix package from pyproject.toml and friends. You can reproducibly build it across platforms without virtualization, you can push it up to a binary cache and avoid source builds, whatever. It's great.

For projects that we run in cloud-native containers (for us AWS Fargate and AWS Lambda), we don't currently ship our own container images. We just publish zip files that we generate with a Poetry plugin that runs builds inside containers that have the same images as are used by AWS in its default runtime environments and push them up with the AWS CLI. The exact steps are stored as a Devenv script so the CI can be a one liner and you can run everything locally just like you would in CI.

> the python community did nothing

Python sucks.

But you can still represent your Python project as a proper Python package and get reproducible-ish build artifacts that are local-first and embrace Python-native tooling and ship it up to prod in a portable format with or without Docker. It only takes one engineer spending a day or two to work it out once for the whole team or maybe the whole company. You just need someone to be willing to RTFM on a package manager or two. The Python community seems to be largely lacking such people but your team doesn't have to be.

but is docker the solution, though? I don't think so, docker itself is prone to supply chain attacks from my understanding