Hacker News new | ask | show | jobs
by blibble 1217 days ago
why does pypi/pip still not have namespacing?

Maven sorted this out 20 years ago

what's a bit sad is the python packaging's authority survey from a few months ago seemed to be mostly interested in vision and mission statements

rather that building a functional set of tools

3 comments

Namespacing is not a security boundary: it's a usability feature that helps users visually distinguish between packages that share the same name but different owners. I don't think it would meaningfully affect things like package index spam, which this is.

(This is not a reason not to add namespacing; just an observation that it's mostly irrelevant to contexts like this.)

obviously, but it allows delegation of trust onto other systems (like the DNS)

example: the package named "aws" on pypi was created by some random guy and has been abandoned for years

if pypi/pip supported namespacing that would be info.randomdude.aws instead

and amazon's packages would be under com.amazon

not being able to namespace internal packages is another security issue that is substantially improved with proper namespacing

to be blunt: not supporting it at this point is reckless and irresponsible

(I note you're part of pypa!)

DNS isn't a particularly secure root of trust; Java is somewhat unique among package ecosystems for picking it as their trust anchor.

It also just kicks the can down the road: Amazon is the the easy case with `com.amazon`, but it isn't clear a priori whether you should trust `net.coolguy.importantpackage` or `net.cooldude.importantpackage`. These kinds of trust relationships require external communication of a kind that package indices are not equipped to supply, and should not attempt to solve haphazardly.

> (I note you're part of pypa!)

I am a member of PyPA, but I don't represent anyone's opinions but my own. It's a very loose collection of projects, and it would be incorrect to read a general opinion from mine.

I will note even namespaces for package management that don’t use DNS are a big step up over none.

For example in PHP/composer/packagist and node/npm they just have a vendor name that can be reserved.

It makes it very easy to distinguish “this package is from the (trusted vendor name here)” and prevents issues with namesquatting.

> Amazon is the the easy case with `com.amazon`, but it isn't clear a priori whether you should trust `net.coolguy.importantpackage` or `net.cooldude.importantpackage`

this is a classic example of not letting perfect be the enemy of good

there is no perfect solution, there never will be

piggybacking off of DNS works extremely well for Java and Go (and the tooling is a pleasure to work with)

meanwhile Python continues to be a complete disaster

I agree there is no perfect solution. But I want a good solution, and I disagree that DNS is a good one.
I look forward to another 20 years of no progress!
I like the way golang handled this. Imports are the URL to the resource. No central distribution mechanism at all. In the past few years they implemented a optional catching layer so you a dependencies going offline doesn't necessarily mean that it unavailable anymore.
Who's to say mr randomdude won't claim com.amazon first?
Let's encrypt solved this by doing a proof of control over the domain name, and in an automated way.

Pypi could do this. Or, they could require that someone demonstrate proof of ownership for a namespace by signing it with a certificate tied to the domain name (so you couldn't claim the com.bigco namespace without having the certs, which you can't get without owning that domain). There could even be signature requirements/proof for each package and/or version uploaded.

I would need to spend money to purchase a domain and some kind of server before I can publish a python module? That doesn't seem right. And I presume I would need to keep paying for it as long as I want my modules available and verified. Attaching required monetary purchases to an open source ecosystem is not a good idea.
Supporting namespacing does not preclude having the old system too. Or from having a public repo namespace like org.pypi or whatever that allows people to upload packages to the current repo using the system they currently have. Might help sort out some of the other packaging problems too - LWN had this the other day: https://lwn.net/SubscriberLink/923238/d48af5401c04db7d/ . Maybe it would help with the integrator notion org.conda or whatever.

Depending on how something like this is implemented, maybe com.github could set it up to pull straight from the project repo.

Just because there's ways it could go poorly, doesn't mean it will go poorly.

Well, in theory you could have a namespace schema that differentiates between user-submitted and organization-submitted packages such that randomdude's would appear as 'public.randomdude.aws' and organization-owned namespaces verified by a DNS record would appear as 'com.amazon.aws'
You could in principle do proof-of-ownership checks like Google does for things like Webmaster Tools, so you’d need to control a domain to have thr corresponding namespace.
It's much easier to correct the ownership of a single namespace than N packages in the global namespace
It can be if you implement it to be so. Just let people create an allowlist of approved vendors for their organization or project from those namespaces. This handles not having to approve individual packages from trusted entities like Google, Microsoft, etc. Update the list when new vendors are needed. Reuse elsewhere as necessary.

Maybe the list can be hosted on an internal server for other employees to reuse. Hosting all the packages internally is overkill. Trusting the world by default is overkill.

Now "pip install gooogle/package"

"Hey User, gooogle/package is not from a trusted namespace. Did you mean google/package which is similar and trusted? Or would you like to add gooogle to your local trust file?"

The lack of any kind of curated feeds that only lists verified or popular packges is tragedy. There should be a reasonable way of allowing clients to protect themselves from a typo.

Namespacing is a lot more than just a theoretical name collision avoider.

Good namespacing (e.g. in Go), in practice, provides critical context about the development/publication of a software package.

Every lang-ecosystem needs to re-implement CPAN the hard way.
That's their calling card. Long discussion threads, mails spanning whole pages, silencing opposition.

But deliver anything more streamlined and secure? Hell, no!