| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Onavo 113 days ago

I will make it simpler to understand. There is only one thing that make or breaks package resolution: do you support diamond dependencies and when.

A diamond dependency is when you have package A depending on package B and C. B depends on package D@v1 while C depends on D@v2. V1 and V2 are incompatible versions of D. This is a classic dependency conflict problem and whether you can resolve it automatically and bundle both packages into the final codebase/binary is the most important architectural decision of the package manager.

Package managers/ecosystems that support diamond dependencies in most circumstances:

Npm (as long as it's not a peer dep), Golang, Rust, Java/.NET (with shading enabled, it's not turned on by default).

With diamond dependency support, in most circumstances you can have arbitrary depth /complexity of dependency resolution.

If you don't support diamond dependencies (basically the rest of the world, Python, Ruby, Dart, Elixir, most lisps in their default setup, statically linked C/C++ in default configurations, maybe Zig too, I am not sure about that one), your dependency tree size is severely limited and it becomes a pseudo SAT problem in some cases if you want optimal dependency resolution.

This is the core algorithmic and architectural limit on package managers. Almost everything else is just implementation and engineering details. Stuff like centralized vs non centralized repos, package caching proxies, security hashes, chains of trust, vendoring, SLSA/SBOM etc. can all be bolted on as an after thought but supporting conflicting upstream dependencies simultaneously requires compliance on the bundler/transpiler/compiler level.

It's also why some languages lend themselves better to tools like Bazel that micromanages every single dependency you have while others do not.

4 comments

ryangibb 113 days ago

(author of the paper here)

My sibling makes a great point about type errors: did you know Cargo (Rust) only supports diamond dependencies where the versions differ only in major version[^0]? So you can have exactly the same problem with B depending on D@v1.1 and C depending on D@v1.2 in Cargo. I believe the reason for only supporting concurrent versions with different major versions (to use the paper's parlance) is because packages with different major versions should have incompatible APIs anyway.

[^0]: Or 0 major version and differing minor version -- Cargo has it's own definition of semver incompatible

> ... and it becomes a pseudo SAT problem in some cases if you want optimal dependency resolution

A couple of clarifications: many dependency resolution algorithms are essentially SAT even if they support concurrent versions (see Cargo). Section 3.3 of the paper might be an interesting read -- it discusses the spectrum of complexity in the problem of dependency resolution, and why some ecosystem's approaches don't work for others. Also, it's generally a 'pseudo SAT problem' (i.e. NP-complete and can be reduced to SAT) to find any valid resolution, not just an optimal one.

> This is the core algorithmic and architectural limit on package managers. Almost everything else is just implementation and engineering details.

I agree, and that's why the paper focuses on the semantics of dependency expression and dependency resolution! But there's a lot more than concurrent versions in the semantics of how package managers express and resolve dependencies, i.e. features, formula, peer dependencies. The point of the paper is that there's a minimal common core that we can use to translate between package management ecosystems, which we're planning on using to build useful tooling to bridge multilingual dependency resolution.

link

VorpalWay 113 days ago

> So you can have exactly the same problem with B depending on D@v1.1 and C depending on D@v1.2 in Cargo. I believe the reason for only supporting concurrent versions with different major versions (to use the paper's parlance) is because packages should have incompatible APIs anyway.

Presumably you mean compatible rather than incompatible there?

The rust ecosystem standardised on semver. This means it is perfectly allowed to use 1.2 in place of 1.1. While you can specify upper bounds for the depdnency ranges, that is extremely uncommon in practice. Instead the bounds are just “1.1 or newer semver compatible" etc.

See https://semver.org/ for more on semver (but do note that Rust uses a variation, where it also applies to the leading non-zero component of 0.x).

link

ryangibb 113 days ago

> Presumably you mean compatible rather than incompatible there?

I've edited for clarity, I mean "because packages with different major versions should have incompatible APIs anyway."

> While you can specify upper bounds for the depdnency ranges, that is extremely uncommon in practice.

In https://github.com/rust-lang/crates.io-index I count just under 7000 upper bounds on dependency ranges that aren't just semver in disguise (e.g. not ">=1.0.0, <2.0.0"):

    $ rg --no-filename -o '"req":"[^"]*<[^"]*"' . | grep -Ev '< ?=? ?([0-9]+(\.0){0,2}|0\.[0-9]+(\.0)?)"' | wc -l
    6727

So it's definitely used. One person's non-breaking change is another's breaking change https://xkcd.com/1172/

link

VorpalWay 113 days ago

How many of those are between a crate and it's proc macro crate? E.g. serde and serde_derive. I believe that is a common use case for exact dependencies, as they are really the same crate but have to be split due to how proc-macros work. But that is getting down in the weeds of peculiarites of how rustc works.

link

ryangibb 112 days ago

As far as I can tell, checking for proc macro crates by suffix, only one: ergol -> ergol_proc_macro with >=0.0.1, <0.0.2.

I didn't include singular dependencies in this grep (=) just upper bounds (< and <=).

Some rough scripting is telling me there's over 600,000 singular dependencies of which just under 10,000 are proc-macro pairs.

link

VorpalWay 109 days ago

That is much more than I expected. I guess people are bad at actually following semver fairly often.

link

Onavo 113 days ago

Very good points. Though to be pedantic, for package managers with concurrent/diamond dependencies support, there's nothing stopping you from pulling in every single dependency of every dependency (this is ~linear time with respect to the depth dependency tree, since you are not conducting any search here but just pulling them in at face value), and maybe deduplicating in linear/constant time with a Set data structure. In this case it's it's very obviously not a SAT problem, but it's ridiculously inefficient since there's zero optimization on the dependency tree. The moment you apply optimizations on it to turn it into a graph from a tree and prune it gets closer to, yes, a SAT problem.

link

ryangibb 112 days ago

> there's nothing stopping you from pulling in every single dependency of every dependency

It depends on the exact system; for example npm's peer dependencies means we can reduce from SAT to npm.

But if there is no such functionality (e.g. just the concurrent package calculus with g(v)=v) they yes, I agree.

link

jaen 113 days ago

The paper does make this distinction under the "Concurrent Versions" property.

Allowing concurrent versions though opens you up to either really insidious runtime bugs or impossible-to-solve static type errors.

This happens eg. when you receive a package.SomeType@v1, and then try to call some other package with it that expects a package.SomeType@v2. At that point you get undefined runtime behavior (JavaScript), or a static type error that can only be solved by allowing you to import two versions of the same package at the same time (and this gets real hairy real fast).

Also, global state (if there is any) will be duplicated for the same package, which generally also leads to very hard-to-discover bugs and undefined behavior.

link

Onavo 113 days ago

Good points. Practically speaking though global state is rarely an issue unless it's the underlying framework (hence peer deps).

Modern languages are mostly lexically scoped and using primarily global variables for state aside from Singletons has fallen out of favor outside of embedded unless it's a one off script.

link

avsm 113 days ago

(one of the paper coauthors here)

While diamond dependencies are indeed one of the big complicating factors, the implementation and engineering details that remain matter a lot too. Section 4 covers the spectrum of quality-of-life features that do introduce subtleties: for example the order of resolution, peer dependencies, depops/features. These are all important for the ergonomics of package constraint expressions, irrespective of whether diamond dependencies are present or not.

The engineering details also flow from the practical implementation constraints: it makes a big difference if solving can done in linear time or if there's a noticeable pause or (worse) you need a big centralised solver. The determinism also guides the implementation of chains of trust.

link

arcatek 113 days ago

It's not about the package manager, it's about the runtime. Python isn't able to support this pattern with its resolution pipeline, so package managers have to resort to do the work to dedupe versions.

By contrast Node.js has built-in capabilities that make this possible, so package managers are able to install multiple versions of the same package without that issue.

link

stabbles 113 days ago

It's not just that, it's also a filesystem layout issue. If you install everything in `/usr` or `<venv>/lib/pythonX.Y/site-packages` you cannot have two versions / variants of the same package installed concurrently.

For that you need one prefix per installation, which is what Nix, Guix, and Spack do.

link

avsm 113 days ago

The runtime can also use mount namespaces to support concurrent installations. Or, if there is a compilation step, the linker can not expose symbols for clashing libraries and just resolve them within the dependency chain.

The package calculus allows all of these to specified cleanly in a single form.

link