The whole world of package maintenance is subject to this, from pip to c# packages, npm, everything. I'm sure we all (developers) have some of these on our systems, because you rely on downloading packages to use that depend on other things.
It makes me want to do development in a dummy account, and my 'personal account' with my passwords and ssh credentials is somehow a separate account. I never do that in practice because it would be too much pain. That's why these kinds of package attacks are valuable.
So what can we do to address this? This story exactly lines up with my long time worry. I could be careful or lucky and avoid things like this, but the packages I use might not be so careful or lucky.
> The whole world of package maintenance is subject to this
Wrong.
There are two types of package repository, "maintained" and what I call "wild west". The latter include pypi, npm, homebrew, dockerhub, and any other repos where any old joe can sign up and start uploading packages under some name they choose. Uploaded packages are controlled by the single entity in charge of the account, except in rare circumstances like this when the site owners were alerted to specific mischief.
"Maintained" repositories have a layer of "maintainers" between the developers and the users. Their responsibility is to shield the user from irresponsible, user-hostile or potentially malicious decisions the developers may (and surprisingly often do) make. These include most Linux distribution repositories but also others like Nix and Guix. They tend to have fewer packages because of the added work of performing the maintenance and tend to lag behind release versions for the same reason but also because of an inherent conservatism of the maintainers. In return users get greatly improved stability. In the best cases (e.g. debian) the maintainers even do backports of security fixes to older stable releases. The maintainers also make decisions in a more public consensus basis and are better able to coordinate releases between different packages to ensure compatibility.
Given the choice, I run a mile from the former style of repository.
Honestly, this is kind of FUDish and assumes the worst case scenario for upstream developers, and the best case scenario for the distro maintainers.
Another way of phrasing what this extra layer of maintainers provides, is a second group of people who can introduce their own irresponsible, user-hostile, and potentially malicious (or at the very least, negligent) decisions. Worse, often times these developers have less (in some cases, far less) knowledge of how the code itself works, and are applying their own patches, often with minimal testing, without fully understanding the scope or impact of the changes they're making. For every poor decision you can find in a package that is popular enough to even appear in one of these downstream repositories, one could just as easily find a case where this extra layer is introducing their own problems.
The non FUD-ish answer is that whether you get your software directly from the upstream developers through an uncurated repository like PyPI, or through a curated repository like a Linux repository neither one is inherently better than the other. Each of them has a variety of pros and cons and part modern day engineering is looking at these tradeoffs and choosing the right set for your particular situation. Sometimes that will even mean that you're choosing different tradeoffs for different packages on the same system.
> one could just as easily find a case where this extra layer is introducing their own problems.
In many years of using both, I have found examples of the latter being much rarer. The example that everyone likes to pull up is debian and openssl, and that's an example from 2006 when attitudes to security were very different.
The major difference between maintainer decisions and developer decisions is that maintained repositories tend to be consensus based where as wild-west repositories are my-way-or-the-highway based. If there's a poor decision made by a maintainer, there's the opportunity to engage with the community and make your case. You don't tend to get rogue maintainers that ignore the rest of the maintainers - such people will get removed.
I've never seen a user-hostile decision made by a package maintainer.
As for testing, we're actually getting to the point where maintained distributions are better tested than the vanilla packages. Nix, for instance, makes efforts to enable tests during the build process of most packages, meaning that when you get an installed package, it is known to pass its tests with its actual set of installed dependencies (which themselves should have all passed their tests). This is moving towards having "integration tested" packages.
With pip, the best you'll get these days is a warning message that it might be installing the wrong versions of things because of a version conflict (and that's a recent addition). No tests run, best of luck...
Another example would be how pip, and venv have some edge case failures on Debian or Ubuntu. Repackaging is a constant struggle, especially when the software adds new features that are not thoroughly tested by the packager.
Another caveat to consider is that people tend to blame the first party (Python developers) when this happens, while the problem is really caused by packagers. Bad things may happen less often with a packager in the middle, but when they do, it’s generally a lot more difficult to deal with exactly because of the added layer and complexity.
My last 3 jobs used pip and npm, they are ubiquitous. My current job makes infrastructure and uses both those too. Sure, we use officially maintained Ubuntu repos but it's incredibly common to use npm or whatever in combination with the safe package systems.
The key would appear untrusted, though. Which, admittedly, is not much of a defense: most developers would just assume it’s a new key and trust it blindly. Still, it would be an additional data point, and possibly another chance to spot the typo (if the UI repeats it).
That's why you check the signature to make sure it's the right one. It's almost always listed on the developer's web site, plus you can often google the signature to see if it's one other people are using as well.
It makes me want to do development in a dummy account, and my 'personal account' with my passwords and ssh credentials is somehow a separate account. I never do that in practice because it would be too much pain. That's why these kinds of package attacks are valuable.
So what can we do to address this? This story exactly lines up with my long time worry. I could be careful or lucky and avoid things like this, but the packages I use might not be so careful or lucky.