Hacker News new | ask | show | jobs
by lobster_johnson 2943 days ago
Rust/Cargo had the luxury of being a greenfield project that could adopt semver from the beginning, whereas Go made the mistake of starting out, and then going years, without any official package management solution. As a result, Go has a swathe of applications and libraries that use specific workflows as well as a mélange of community-developed package management tools such as godep, Glide and dep. The semver standard, in particular, has been inconsistently adopted by the Go community.

In other words, any new Go tool either has to support/import existing code, or to wipe the slate clean and say that for a package to be importable it has to follow a new spec. dep decided on the former, and my impression is that this has had unfortunate consequences, because that inherits a lot of historical baggage.

We've been using dep for a while (having escaped the bugfest that is Glide, which used a very similar approach), and it's pretty evident that the solver is buggy and slow and also complicated enough that fixing issues like [1] can only be done by a select few that already understand the codebase. I'm not in a position to judge what the causes of all of these issues are, though I'd wager they're not entirely unrelated to the inherent complexity of SAT solving. The current dep issue tracker is full [2] of reports mentioning the solver, not to mention that dep currently has problems with known libraries such as the Kubernetes client [3] and Protobuf. (Google-related projects have historically used godep.) Again, possibly related to this specific implementation and not necessarily something that would apply to a hypothetical "Cargo for Go", but I don't know.

Any idea how Cargo compares to dep overall?

[1] https://github.com/golang/dep/issues/1306 — this one is a nightmare if you work anything related to Kubernetes.

[2] https://github.com/golang/dep/issues?q=is%3Aissue+is%3Aopen+...

[3] https://github.com/golang/dep/issues/1207

1 comments

Rust was actually around for a good while before Cargo was adopted. (In fact, there were two Cargos, the older of which bore very little resemblance to the Cargo of today.) Of course, Go was stable for longer.

I have to admit I'm a bit confused as to why the dependency resolution algorithm in dep is seen as slow. The speed of the solver is not a problem in any other package management system I've seen. If it is indeed the solver that is the problem (which, again, I'm skeptical of—I'd have to see profiling data to believe it), then it could just come down to optimization differences between rustc and Go 6g/8g.

> The speed of the solver is not a problem in any other package management system I've seen

This! I've never heard anyone complain about this aspect of a package manager, EVER. vgo seems to be optimizing for a problem no one has.

> The speed of the solver is not a problem in any other package management system I've seen

really ? It has been a large problem in Debian for instance and has enabled a lot of research (https://scholar.google.fr/scholar?q=debian+solver). One of the reason for Fedora's yum -> dnf change was also a change of solver. It's a hard problem that affects a lot of people.

To clarify, I(and the OP) was specifically talking about programming language specific package managers.
puppet dependency managers have also been an issue it's certainly easy to solve poorly
You can't compare languages and os package manager. They are on a completely different scale.
Scala / SBT has a particularly slow dependency resolution
Go has always optimised for build speed. I guess they considered dependency resolution as part of the build process.

Which it technically is, I suppose, but when you're coding and iterating the code-build-run loop you generally don't need to add new dependencies each time. And that's when the build speed matters, of course.

And, most CIs cache build dependencies these days which is an easy work around.
I once waited for a long time for an older Haskell dependency solver to contemplate a situation before it gave up.
i've seen aptitude get really confused about what to install and computing a solution for a few minutes.

once.

10 years ago or so, i don't even remember.

I should clarify that I'm referring to language package managers. Their problem domains are significantly different than those of system package managers.
I'm curious, what makes the problem domains different?

I'm asking because I'm interested in "universal" package managers like nix.

Here [1] is the "dep ensure -v" output for a project of mine. It takes almost 12 seconds even when there are no changes to the actual file. I don't know why, or whether it's actually the solver (though the output seems to indicate it).

[1] https://gist.github.com/atombender/7c28f1d371fcb139e1e742a08...

As you can see from the output, it is not the solver per-se, but the weird idiosyncrasies of go imports and gopath layout. 'satisfy' and 'select-atom' and such are the solver bit and take about 20ms all together. A SAT solver is 20ms, MVS might be 1ms, but who cares about that difference, right?

The top 3 items there are slow because they're:

1. 'source-exists' (~6s) which will do network traffic to find if a project exists to be downloaded or is in the cache; it's network io heavy in most cases.

2. list-packages (~3s) which parses the downloaded source code for import statements to find further dependencies; disk-io heavy + go loader has to do some work

3. gmal - GetManifestAndLock (~2s) which looks for lock files, including of other dependency solvers; disk io mostly I think

Any system designed with the constraint that it cannot use a centralized registry / list, must be compatible with things not using this system (and so must parse their code), etc will have these problems regardless of the algorithm.

Those steps are all doing network/disk-io/go-parsing, and none of that is SAT solving.

I don't think vgo has these problems because vgo is built by the go team and can dictate far more, such as the use of a centralized repo, that all dependencies must use vgo, etc.

Thanks for the explanation!

The fact that dep parses import statements (as does Glide) is something I've never liked. It means that if you run "dep ensure --add" on something not yet imported, it will complain, and the next "ensure" will remove it. This is never in line with how I actually work. I need the dependencies before I can import them! There's no editor/IDE in existence that lets you autocomplete libraries that haven't been installed yet.

It also means that "dep ensure" parses my code to discover things not yet added to Gopkg.toml. That's upside down to me. I want it to parse its lockfile and nothing else; the lockfile is what should inform its decisions about what to install so that my code works, my code shouldn't be driving the lockfile! If I try to compile my code and it imports stuff that isn't in the lockfile, it should fail, and dep shouldn't try to "repair" itself with stuff that I didn't list as an explicit dependency.

I'm sure there are edge cases where the current behaviour can be considered rational, but I don't know what they are. As you point out, dep has to do a lot of work -- but why? Running "dep ensure" when the vendor directory is in perfect sync with the lockfile should take no time at all, and certainly shouldn't need to access the network. Yet it takes the same amount of time with or without a lockfile.

Small note, this isn’t something that you’ve said, but since we’re comparing the two in this sub thread overall, Cargo doesn’t require a central registry either. You can pull straight from version control, and the lock file will even keep track of what HEAD is at the time, maintaining reproducibility. Or from the file system. Etc.

Thanks for your comments here, there’s a lot of stuff I wasn’t aware of. Very illuminating.

That's quite weird. When I run `rm Cargo.lock && cargo generate-lockfile` on the Servo repo (test performed on the cheapest VPS that money can buy) it exits near-instantly (after first spending three seconds trying to git-fetch new versions of the dozen custom dependencies that live on Github rather than crates.io). For reference, here's what Servo's dependency graph looked like two years ago (July 2016): https://dirkjan.ochtman.nl/files/servo-graph.svg ; the number of transitive dependencies is quite large and yet the runtime of version selection is negligible.
It's not strange at all for go.

Because third party go packages may not have a dep file, and because go programmers expect vendor directories to be minimal and not include unused imports, dep parses all of the go code of the project, and all the project's transitive dependencies.

It has to parse every .go file to find all 'import' statements, and it also has to find remote versions by making multiple network requests per dependency (typically 1 http-get + 1 git pull operation).

This is obviously going to be much slower than cargo where it's assumed every dependency is also using cargo and all needed information is present in metadata files... and there's one single fast api to download data from and cache (crates.io).

If cargo had to do the equivalent of `cargo check`-style parsing to find all 'extern crate' and 'use' statements before it could spit out a valid lock, and it couldn't use only 1 request to update all crates.io data, it would probably be closer to the speed of dep.

I think the speed difference is thus largely a result of go's lack of a central repository and lack of a unified packaging solution.

Thanks for the insight, that's very helpful. That would confirm what I suspected: it's not the core solving algorithm that's slow. Rather what's slow is building the graph in the first place.
Yes, I had overlooked that Go probably doesn't have anything like the crates.io index (https://github.com/rust-lang/crates.io-index) to allow instant discovery of versioning metadata. And, AFAICT, even MVS would have the same problem here and would take the same time to resolve, since it still needs to access the network to fetch remote repos to discover versioning metadata; rather than pointing the finger at SAT solvers, it looks like vgo should be tackling Go's lack of a central package host (the vgo manifesto mentions "proxies", but it seems that those are just intended for solving the problem of persistent availability).
We use Chef (Ruby). Cookbook version solving has been a repeatedly painful experience, occasionally never finding a solution. When this happens, you get to manually dig around and find the culprit.
> The speed of the solver is not a problem in any other package management system I've seen.

The package manager in YaST (Suse Linux's sysadmin tool) was notorious for its slow solver (and slow everything-else, for that matter) around 2006, when I started using Linux. It improved a lot in the openSUSE 11.x series around 2007/8 when they switched from a homegrown solver to a standard SAT solver package.