Hacker News new | ask | show | jobs
by lilyball 3696 days ago
Something that I'm surprised this page doesn't talk about, and which is very important considering the recent hubbub over left-pad, is that any dependency you get on Cargo can be relied upon to continue to exist forever (well, as long as the crates.io site still exists, but if that goes away so does the Cargo index). The reason is because you can't ever remove a published version of your crate from crates.io. You can yank a version, which tells Cargo not to allow any projects to form new dependencies on that version, but the version isn't actually deleted and any projects that have existing dependencies on that version will continue to be allowed to use it. This is documented at http://doc.crates.io/crates-io.html#cargo-yank.
9 comments

> Something that I'm surprised this page doesn't talk about, and which is very important considering the recent hubbub over left-pad, is that any dependency you get on Cargo can be relied upon to continue to exist forever

Maybe the reason why it's hardly talked about is because it's common sense and pretty much all dependency managers support it?

Except node of course, because they have no idea what they're doing.

Except not all dependency managers support it. I'm not even sure if it's safe to say that a majority of dependency managers support it.
Package persistence is not really a dependency manager feature though, it's a package repository feature. npm didn't fail; npmjs.com did.

I'm just waiting for this to happen to bower next. AFAIK they're just a registry pointing to github. All it's going to take is someone doing a force push without thinking and we're in this same situation again.

> Except node of course, because they have no idea what they're doing.

NPM and node are two different things.

..his point stands.
This is incorrect and very rude. Rubygems has also implemented an automated delete feature. Rubygems and NPM have orders of magnitude more users than crates.io; once crates.io approaches that scale the team will have to allow users to delete crates or somehow find funding to field these kinds of support tickets.
the parent comment might have been worded better, but it has a point, i.e. removing published code generally has to be supported in some way to handle special situations, e.g.

* people publish stuff inadvertently (i.e. private information/keys) * people publish stuff they are not allowed to (i.e. copyright and trademark violations) * people publish stuff you do not want to see published (i.e. stuff related to breaking legal or ethical laws)

It can be argued that users must not do that or that cargo.io doesn't have to oblige, but if they, for example, get a DMCA notice they'll still have to.

I think the key is that crates.io (and rubygems.org, nuget.org, etc.) as repo owners have to own the removal operation themselves and not delegate that to the package owner/maintainer. The repo owner is better positioned to take package consumers' needs into account and make good decisions about when and how to remove a package.

As far as satisfying support requests: obviously exceptional stuff like DMCA gets handled, but package owners publishing keys, etc. is still their responsibility and shouldn't accelerate or even guarantee removal IMO. If you publish secrets, it's not crates.io's responsibility to help you hide that mistake, and you need to be changing those secrets anyway.

> I think the key is that crates.io (and rubygems.org, nuget.org, etc.) as repo owners have to own the removal operation themselves and not delegate that to the package owner/maintainer. The repo owner is better positioned to take package consumers' needs into account and make good decisions about when and how to remove a package.

What does this mean? Rubygems and npm both allow a full unpublish, not only a yank. crates.io does not. Rubygems and npm are the same in this regard.

Remember that we are not talking about npm unpublishing someone's library without their approval (that's a different issue), we are talking about npm allowing a user to unpublish a library, which crates.io does not allow a user to do. I will repeat, because people seem not to believe it: rubygems allows this as well, specifically because the maintainers of rubygems could not handle the support tickets that resulted from not providing this feature.

The idea that this is some amateurish aberration from npm is a myth. If Rust is lucky, someday crates.io will have to choose between paying someone to field these tickets or letting users unpublish code.

the problem with saying "users should not publish their ssh keys" is that they will still do it and ping you with requests to remove them even if you have said it's not possible to do it, causing unnecessary support work.

That is, AFAIU, the reason the rubygems.org maintainers allow it now.

http://blog.rubygems.org/2015/04/13/permadelete-on-yank.html

Except they even state:

"If you’ve pushed a gem with internal code, you still need to reset API keys, URLs, or anything else sensitive despite the new behavior."

And:

"...we’ve been using an Amazon S3 bucket to store the gems for years now with versioning on - so if someone does remove gems that are necessary, we can easily restore them."

So what they've really done is given developers the illusion that the unwanted gem has been removed, while introducing the ability to break everyone's workflow just like npmjs. In some ways this is worse than before; devs still need to change secrets, and if it's non-secret sensitive code they are concerned about, it's still 'out there' and the dev still has to trust that the rubygems.org people don't do something unwanted with it.

> If you publish secrets, it's not crates.io's responsibility to help you hide that mistake

If you publish secrets, you're causing crates.io to perform copyright violations. It's not just about helping you hide the mistake, but about helping to stop further violations.

You shouldn't need to resort to a formal DMCA request to stop copyright violations.

How is publishing a secret (I mean passwords, ssh keys, etc.) a copyright violation?
How could my comment have been worded better? It is incorrect to claim that npm is alone in allowing automated unpublish, and rude to suggest the reason they allow it is because they do not know what they are doing, when in fact the reason they are doing it is because they have far more users than any other package manager. My comment was completely true and not at all aggressive.
> You can yank a version, which tells Cargo not to allow any projects to form new dependencies on that version, but the version isn't actually deleted

That's how RubyGems used to work, except you could contact the support team and ask them to permanently delete your gem version if something really sensitive and irrevocable was put into it. They had to change that due to their support log getting too big: http://blog.rubygems.org/2015/04/13/permadelete-on-yank.html

Cargo's policy is that if you upload secrets, you need to change the secrets because the code can't be deleted. From the aforelinked page:

> A yank does not delete any code. This feature is not intended for deleting accidentally uploaded secrets, for example. If that happens, you must reset those secrets immediately.

Even if they're given a court order to take a version down? It may be their policy, but I guarantee it will happen at some point.
We will comply with the law where required.
Court orders will be a lot less frequent than normal user requests so the overhead to the team won't be as high.
Of course, and I totally agree with their approach. I'm just saying the narrative eridius is pushing that somehow they can make assurances about things never being deleted totally false. For example, I'm sure if someone somehow put child pornography in a Rust crate it would rightly be taken down pretty fast (and not require a court order).

Also I just wanted to give a little history because eridius's original comment made it sound like it was a novel concept.

This seems a bit... inflexible. I definitely understand the arguments for not breaking builds and for reducing administrative overhead and such, but not every bit of secret data can be revoked like a key/credential can. What if you accidentally include user data, or proprietary business-logic code, or...? (Yes, with proper data hygiene and processes you'd never even come close to doing any of that, but it seems there should still be an escape hatch.)
And of course, a centralized site can still be compelled to remove a package via legal action if push comes to shove.

  > if that goes away so does the Cargo index
Not so, the location of the index is independent of crates.io. Currently the index itself is just hosted on Github, whereas all of crates.io is hosted on S3. Which is actually kind of a pain sometimes, since if Github goes down it means that Cargo won't be able to find the index and I don't know if there's an easy way to override the index check. In the future I expect Cargo will gracefully continue if the index can't be updated due to connection failure, though I'd think it would prompt the user to make sure they're aware that their local copy of the index might be out of date.
Ah, good to know. I assumed the index was hosted as part of crates.io but I didn't actually bother to check. The point I was trying to make (even if I didn't communicate it properly) was that if crates.io goes away for good, the index will too and so it won't really matter that the code is inaccessible because cargo won't know where to find it anyway. Of course, based on what you said, it's certainly possible for everyone involved in crates.io to get hit by a bus and crates.io vanish when the S3 bill goes unpaid while still leaving the index up on GitHub, but if crates.io is taken down intentionally then presumably so will the index.
I know it wouldn't really fix issues like left-pad, but I would really like namespaced packages similar to the way Github does them. I think group ownership would be more explicit and understood. Top level packages encourages sqatting and small/old/unmaintained packages getting names that people would misunderstand for something else. I understand a lot of that is on the user of the library to research before downloading, but intuitiveness is a virtue. It's true packages could just be called eg. reactjs-node-bridge or something as opposed to reactjs/node-bridge, but anyone can prefix deceptively.

Cargo addresses the namespacing concern here http://internals.rust-lang.org/t/crates-io-package-policies/...

I think this is the scenario that has me feeling warm and fuzzy about vendoring. Sure, eventually even vendored dependencies will become stale, but I don't have to rely on a package manager past the initial delivery. It's not perfect, but I don't sweat code disappearing from a black box I don't own.
It does mention it, briefly:

> This is enough information to uniquely identify source code from crates.io, because the registry is append only (no changes to already-published packages are allowed).

> well, as long as the crates.io site still exists, but if that goes away so does the Cargo index

That makes me wonder: Is it easy or possible to replace crates.io with a self hosted repository?

Background of the question is that I know of a company where access to the standard public maven repo is forbidden. They use a commercial repository provider but I don't know if it is hosted on premises.

It's not easy, but it's possible. Everything is open source, and Cargo can easily be pointed at whatever host you want.

The feature that I'd like to see but haven't found the time to implement is delegation: "look at this index first, but if it's not here, go look at this other one". Right now, if you spin up your own crates.io and point Cargo at it, it won't have any packages... which works for some people, but not others.

> "look at this index first, but if it's not here, go look at this other one"

I agree that this would be very helpful for a lot of people, but it is kind of the opposite of what I was asking about.

As far as I understand it the commercial repos try to solve two main concerns:

1. license compliance

2. security

Better performance and reliability are just additional benefits.

I only know the details for a certain Fortune 500 company. They don't want the builds to fetch packages from a site they don't control, and they certainly don't want "if it's not here, go look at this other one". The idea is more control about where the packages come from, not more flexibility.

I think if Cargo doesn't provide a way support alternative (possibly commercial) repos, it would be an obstacle to the adoption of Rust in the corporate world.

If you just want to ignore the broader OSS ecosystem, and run your own version of Crates.io behind a firewall, that's 100% supported today. The only real issue is that how to do so isn't particularly well-documented.
Sorry for pestering again but I think this is kind of important and I haven't made myself entirely clear yet (English is not my first language).

In maven the repo URL is configurable in settings.xml. This URL can be different for different departments of even different projects.

From what I see in the cargo source the crates.io URL is hard coded. So the DNS is the only level of redirection we have. Using varying IP addresses for crates.io for different departments or even projects wouldn't fly, at least not in the world I live in.

>ignore the broader OSS ecosystem

It's not about that either because the commercial repos contain very much the same OSS packages as the standard repo but don't present all of them to everyone all the time. Take for example a car company: GPL3 for in-house projects that are just used by the employees are discouraged but somewhat tolerated. GPL3 for projects that run in the car are a big big no no. You want to be certain that no dev ever introduces GPL3 source into anything that is in the car. You want your build to fail if any of packages change their license to GPL3. You want your build to fail if any of you packages has a known vulnerability.

I know Cargo is not maven, but I believe this is a feature which is crucial for industry adoption. I think I will just add a feature request for this on GitHub.

  > Sorry for pestering again
No worries! This thread is a bit old but I'll try to pay attention to it.

  > From what I see in the cargo source the crates.io URL is hard coded.
It's not: http://doc.crates.io/config.html#configuration-keys

TL;DR:

  [registry]
  index = "URL_GOES_HERE"
and you're good.

  > It's not about that either because the commercial repos contain very
  > much the same OSS packages as the standard repo but don't present all 
  > of them to everyone all the time.
Ahh yeah. What I mean is, you'd have to set up the packages in that registry yourself. Which sounds like what they'd want to do, so seems fine.
What would be useful is a sort of "caching proxy" that could have various knobs to handle situations like:

- crates.io is down

- crates.io says this cached package doesn't exist.

- etc...

This is already possible today with existing caching proxies. This is a great way to make your CI/builds more reliable and quicker.
I meant a caching proxy that functioned as a mini Crates.io in the absence of actual crates.io being up. Depending on the crates.io protocol, just caching HTTP requests might not be enough, but also acting as a (offline-able) middle man that knows the protocol gives rise to other knobs and such (e.g. a configurable blacklist of packages).
What would be really nice is if there were a system like how debian, ubuntu, etc. does it, and allow for official (and unofficial) mirrors
But that's really a crates.io feature, not cargo's. You could easily deploy your own repository which serves everyone a random file when you pull "some_crate-1.2.3".
Let's just see what happens when they get their first court order, which they certainly will (due to their global name-spacing).
As I said below, we will comply with the law.

(And I'm not sure what that has to do with namespacing.)

I believe he means that packages aren't namespaced (owner/package-name), so if I publish a package called nike, Nike could come and want to take over the package name.
Global namespacing has nothing to do with that. Even with per-user namespacing, there's nothing to stop a user from making their username "nike" and publishing all their code under "nike/package-name", which would be subject to just as much potential legal action as a package named "nike" itself. Likewise there's also nothing to stop Bob from uploading "bob/nike".
If course global name-spacing has something to do with it!

What's the difference between a "username" resulting in nike/package-name and a package called nike-package-name?

Not much in practice. Name-spacing doesn't mean "add another made up name entry somewhere".

That's why people who thought about these issues out-sourced this concern completely: Prove your ownership of nike.com and you can publish as nike.com.

Suddenly 99.9% of the trademark issues are gone, handled by registrars.

I think this approach is pretty obvious and I find the lack of thought coming from rust devs deeply concerning.

I'm not sure what you're trying to say. The problem exists independently of whether or not global namespacing is implemented by a package repository. Nike would have just as much authority to remove a "nike" package in a global namespace as it would to remove "bob/nike" (which is to say, almost no authority whatsoever).

I have no dog in the fight over whether or not a global namespace is a good idea, but it's tiresome to see irrelevant arguments being trotted out against global namespacing.

It wouldn't likely meet the standard of "confusingly similar" unless you called it "Nike" in order to somehow leverage the actual shoe brand (logos 'n all). But even if it didn't, you could imagine an index operator succumbing to the (unreasonable) request of a lawyer in order to avoid a conflict requiring representation.