Hacker News new | ask | show | jobs
by gw 2105 days ago
You really should look to other ecosystems and see what lessons they've learned. In java, packages are normally "namespaced" by the author's reverse domain name, like `org.lwjgl/lwjgl`.

Since clojure uses maven as well, the same applies, but clojure tools like leiningen decided to create a shortcut: if the group and artifact name are the same, like `iglu/iglu`, they can be collapsed into one name: `iglu`

Well, that just encouraged everyone to choose collapsible names. In retrospect, this didn't buy us much. Who cares about saving a few characters of typing? Most now seem to agree it wasn't a good idea.

When the "collapsed" name falls out of maintenance, the forks will all seem somehow less "official", even if they are much higher quality. Forks are inevitable; why would you want to discourage them?

I finally decided to start using the reverse of my personal domain for my future libraries. The java folks were right all along.

7 comments

Agree, Go went a similar route as Java and I think that's good as well.

The new tools.deps in Clojure actually is moving to disallow collapsed names for similar reason and will force iglu/iglu.

Here's a rationale from them:

> The groupId exists to disambiguate library names so our ecosystem is not just a race to grab unqualified names. The naming convention in Maven for groupId is to follow Java’s package naming rules and start with a reversed domain name or a trademark, something you control. Maven itself does not enforce this but the Maven Central repository does all new projects.

> In cases where you have a lib with no domain name or trademark, you can use a third party source of identity (like github) in combination with an id you control on that site, so your lib id would be github-yourname/yourlib. Using a dashed name is preferred over a dotted name as that could imply a library owned by github.

Can I get a link to the quoted document? The quote raises more questions than it answers. Who determines that someone applying for a qualified name is the owner of that trademark (presumably this is a full-time employee; who pays their salary?), and what is the process? Trademarks are not a universal namespace--even within a single legal jurisdiction you can have the same name legally owned by different people due to different contexts--so who decides who wins?
The quote is from: https://insideclojure.org/2020/07/28/clj-exec/ under section "Deprecated unqualified lib names".

That said, ya this is best intentions unfortunately. I'm guessing if you own a real trademark, you could actually sue people using your trademark as their group-id.

Otherwise in general they recommend using a registered web domain name. Someone else could take over your domain name as theirs, but I think the registry owner, like maven-central, if you contacted them and could show you own that domain, they might be able to take actions against the impersonator. Same for a github user.

Actually, thinking about this, I feel it be great if the repository owner like maven-central required a form of proof of ownership of the domain or the github id. That could add a lot of trust to the whole process.

> maven-central required a form of proof of ownership of the domain

They do. There's even a manual review procedure.

Oh really? Wow that's awesome I had no idea maven central did that.
The quote was a guideline, not a requirement. Cognitect (who makes the clojure CLI tool) doesn't even control clojars, the main clojure maven repo, so they wouldn't be able to enforce that even if they wanted to.
>Forks are inevitable; why would you want to discourage them?

The epitome of this mindset I think is "hostile fork" -- the entire notion is nonsensical. The whole point of being FOSS/OSS is the freedom to fork -- by all damn rights, you should fork as you please, and be pleased to fork!

The actual problem is not forking.. its community fragmentation, and more importantly loss of a "source of truth". Of course, maintaining that source of truth is otherwise known as centralization, with all the problems that brings, but there's nothing inherently wrong with forking.. that's just the natural specialization and evolutionary processes at work.

The solution is to make it easier to find those top-tier libraries, and this is orthogonal to forking; mainly handled by blog posts and "official" library listings/recommendations, and things like This Week in Rust.... namespacing or not doesn't really get you anything there.

> "hostile fork" -- the entire notion is nonsensical.

It is not. It is based in experience. See the xMule/aMule fork. A hostile fork is when the fork project starts bad-mouthing the original project and its maintainers.

The notion that forking is by itself hostile is non-sense.

If I'm remembering right, something similar kind of happened with uBlock and uBlock Origin, but it was the original maintainer who came back and forked after the new maintainer became hostile, or something like that.
This was discussed in detail in Homesteading the Noosphere of Eric S Raymond.

I think it's in there somewhere that he compares the right to fork with the right to bear arms: Good to have, but the situation must have gotten really shitty if a fork is a good solution.

https://firstmonday.org/ojs/index.php/fm/article/download/14...

We did look and learn; a lot of crates.io was informed by several of us having experience with CPAN, RubyGems, and npm. Both the good and the bad.

And also by the experience of going through GitHub hosting a RubyGems server and all the fallout that happened there.

I have less experience with CPAN and RubyGems but npm's namespacing system has two very serious problems:

1. It was introduced very late, meaning the community had already formed patterns of contribution around a flawed flat system. This is a problem of the flat system, not of the namespaced one.

2. It is still to this day entirely optional (for understandable backward compat. reasons). This gives namespaceless packages a misplaced position of authority over namespaced ones, which erodes the value of namespacing.

These are tough problems to get around if you start with a flat structure, but they really just outline the urgency of switching to namespaces for a relatively young project.

I agree with a lot of this perspective. It's also directly relevant to our situation, because we are basically in exactly that place now, and dealing with these problems is something that proponents of adding namespaces need to navigate.
This is the only good argument I've heard yet for not adding namespace. And maybe it's a defeating argument, maybe Crates is doomed to not have namespace due to the cost of putting them in after the fact.
I'm not sure you followed the above 2 points, or perhaps read them through tinted glasses.

I wasn't arguing that npm's namespacing system is worse than their initial system, nor that their switch to namespacing was a mistake.

The current npm namespaced system, with flaws, is head-and-shoulders better than the previous flat system.

You're saying you did "look and learn". If by that you mean you looked at the end product (npm's is seriously flawed) without looking at the journey to that product (npm's is still a huge improvement over what they started with), then you're not going to learn much from that kind of "looking".

I highlighted Composer/Packagist in a sibling comment as a system you should look and learn from (w.r.t. namespaces).

Choosing to only look at flawed systems that started flat seems like you're just being selective to support your own thesis.

PHP and it's ecosystem has a lot of problems, but I think Composer/Packagist is as surprisingly exemplary example of how to go about structuring package management.
Except for the decision to not tackle solutions for native extensions.
Add a "legacy" namespace and move all existing packages there. Allow for a transition period where tooling will add "legacy" to instances where no namespace is given. Add a mechanism for legacy packages to indicate their new namespace so that transitioning could be mostly automatic for package users.

Not effortless, but not necessarily very costly either.

We did look and learn; a lot of crates.io was informed by several of us having experience with CPAN, RubyGems, and npm.

Don't those package registries all suffer from not having namespaces? RubyGems in particular [1].

[1] https://thehackernews.com/2020/04/rubygem-typosquatting-malw...

Namespaced packages have almost the same problem

People can just have maliciously typeo-d namespaces

Actually you could argue it's worse since people tend to pay more attention to package names rather then namespace names

Everything has upsides and downsides. There are downsides to the Rubygems approach.

However, typosquatting is an orthogonal problem to namespacing, you can still typosquat a namespace.

From the article:

This is not the first time typosquatting attacks of this kind have been uncovered.

Popular repository platforms such as Python Package Index (PyPi) and GitHub-owned Node.js package manager npm have emerged as effective attack vectors to distribute malware.

"Orthogonal" suggests no connection but what I see above is a list of package managers that don't have namespacing.

They didn't make the claim that no namespaces had anything to do with this, that's an inference you're making from the specific list, when it could be for any number of reasons. For example, these are some of the largest package management ecosystems in the world, so they're more likely to be attacked than smaller ones. (You can of course come back and say that there are other massive ecosystems too, but that's kind of my point: there's more to a discussion than a random article listing a few ecosystems.)

I stated my reasoning in my comment: you can typo squat a namespace, just as easily you can any identifier. I don't see any inherent difference between the two.

> "Orthogonal" suggests no connection but what I see above is a list of package managers that don't have namespacing.

correlation does not equal causation.

how is it not apparent that typosquatting is possible regardless of whether namespacing is in play?

for example, URLs are namespaced, and are the classic example of typosquatting: https://en.wikipedia.org/wiki/Typosquatting

I think what the GP wanted to say was along the lines of "look at _better_ approaches", not just "look at approaches".
Maybe. Regardless of what my parent meant, a lot of people in these discussions imply that we never looked at prior art because we did not make the choices around the tradeoff that they wanted us to make. And we did look at many, many approaches. We just decided to not go in those directions.
I wasn't even addressing the crates.io team, i was addressing the author of the post.
The team at Sun really got this right.

I recall an interesting precursor in Solaris/ System V - the OS package guidelines recommended using your _stock ticker_ as part of the short name.

Java was my introduction to namespacing, so I only suspected but didn't know for a long time that Java overdid namespacing.

Companies change names, they merge. Sometimes they go out of business but stick around as a foundation stewarding their old projects, and you might be going to example.org for years for documentation on a com.example module.

And the namespaces weren't enforced (who is going to stop me from publishing a com.example.foo module?), so it expected much and delivered little.

No namespaces is bad. Five level namespaces are better, but still bad for different reasons. Two might be good. Some might prefer three. But zero is right out.

I agree, the Java namespace system isn’t that good. In fact I hate it. First because it uses reverse DNS while common use of URL are in the opposite order. Second because the package sbu-namespace is enforced with the file system structure, which makes for crazy long names.

On the other hand I really like how C# and dotnet in general handle the matter. Package namespace are separated from logical (in-code) namespace. Package namespace are usually two/three dotted term, making ownership clear while not bloating the names.

> First because it uses reverse DNS while common use of URL are in the opposite order.

Well, that's more a bad thing about the DNS though. "toplevel.domainname.subdomain/path" should really be how it should be structured. SUN improved this and made the hierarchy proper.

my understanding of DNS is that it actually uses ".tld.domain.subdomain" as a representation, it is just the URL format that mixes them up
The Maven Central repository will not allow you to publish a Java package com.example.foo.
> Forks are inevitable; why would you want to discourage them?

The conclusion that we don't want to discourage forks may be valid, but this doesn't seem to be good reasoning. Lots of things are inevitable that we want to discourage or delay.

> Well, that just encouraged everyone to choose collapsible names. In retrospect, this didn't buy us much. Who cares about saving a few characters of typing?

This appears to be using evidence to prove the opposite conclusion; if everyone voluntarily chose to use shorter names, then it means that everyone cares about having shorter names. If there is a more substantial argument for why people have decided that the collapsing was mistake, I'd like to read it.

I didn't choose the shorter names because i "care[d] about having shorter names", i did so defensively, because i figured if i chose `net.sekao/iglu`, someone else would choose `iglu/iglu` which would imply that theirs was the original or official version.

Another point i didn't mention is that maven was designed from the start to be decentralized; many companies run their own private maven repos, but also pull artifacts from maven central. Having group names reduces the chances of collisions between their private servers and a public maven server.

That's a better rationale, although I don't think that really solves your stated problem; as an uninformed user I am still more likely to think that iglu/iglu is the more authoritative source there. Given this, any project that wants to authoritatively own its identifier should probably also register its own top-level namespace... which unfortunately brings us back around to where we started.
It would at least be far less of an issue. I don't see anyone being confused that https://github.com/facebook/react is the official repo, and not https://github.com/react/react. It's the fact that a collapsed name is a shortcut that imbues it with this special stature. And i believe maven central doesn't even allow one-segment group names for new libraries, though clojars obviously does.