Hacker News new | ask | show | jobs
by munificent 1536 days ago
> There is no way for changes in the outside world—such as a new version of a dependency being published—to automatically affect a Go build.

> Unlike most other package managers files, Go modules don’t have a separate list of constraints and a lock file pinning specific versions. The version of every dependency contributing to any Go build is fully determined by the go.mod file of the main module.

I don't know if this was intentional on the author's part, but this reads to me like it's implying that with package managers that do use lockfiles a new version of a dependency can automatically affect a build.

The purpose of a lockfile is to make that false. If you have a valid lockfile, then fetching dependencies is 100% deterministic across machines and the existence of new versions of a package will not affect the build.

It is true that most package managers will automatically update a lockfile if it's incomplete instead of failing with an error. That's a different behavior from Go where it fails if the go.mod is incomplete. I suspect in practice this UX choice doesn't make much of a difference. If you're running CI with an incomplete lockfile, you've already gotten yourself into a weird state. It implies you have committed a dependency change without actually testing it, or that you tested it locally and then went out of your way to discard the lockfile changes.

Either way, I don't see what this has to do with lockfiles as a concept. Unless I'm missing something, go.mod files are lockfiles.

6 comments

There are some subtleties here, but go.mod files are not lockfiles, including go.mod files don't follow a traditional constraint/lockfile split, which I think is part of the point in that snippet you quoted.

Another way they differ is when installing a top-level tool by default npm does not use the exact version from a library's lockfile to pick the selected version of another library as far I as understand, whereas Go does by default use the exact version required by a library's go.mod file in that scenario.

In other words, go.mod files play a bigger role for libraries than a traditional lockfile does by default for libraries in most other ecosystems.

Here's a good analysis on the contrast between go.mod and the default behavior of more traditional lockfiles (using the npm 'colors' incident as a motivating example):

https://research.swtch.com/npm-colors

That link also includes some comments on 'npm ci' and 'shrinkwrap' that I won't repeat here.

All that said, go.mod files do record precise dependency requirements and provide reproducible builds, so it's possible to draw some analogies between go.mod & lockfiles if you want. I just wouldn't say "go.mod files are lockfiles". ;-)

Wouldn't build size increase a lot if transitive dependencies were pinned to direct dependency lockfiles? Like if library A says "use version 1.0.0 of library X" and library B says "use version 1.0.1 of library X", then you'd likely end up bundling duplicate code in your build.

Not saying the tradeoff isn't worth it, but pinning to dependency lockfiles isn't without downsides.

FWIW, that's not what Go does. In your scenario, a Go binary ends up with a single copy of library X -- the 1.0.1 version. That's because library A is stating "I require at least v1.0.0 of X", and library B is stating "I require at least v1.0.1 of X". The minimal version that satisfies both of those requirements is v1.0.1, and that's what ends up in the binary.

That behavior is Go's "Minimal Version Selection" or "MVS". There are many longer descriptions out there, but a concise graphical description I saw recently and like is:

https://encore.dev/guide/go.mod

That's the default behavior, but a human can ask for other versions. For example, a consumer of A and B could do 'go get X@latest', or edit their own go.mod file to require X v1.2.3, or do 'go get -u ./...' to update all their direct and indirect dependencies, which would include X in this case, etc.

Continuing that example -- in Go you end up with v1.0.1 of X by default even if v1.0.2 is the latest version of X.

That is a difference with many other package managers that can default to using the latest v1.0.2 of X (even if v1.0.2 was just published) when doing something like installing a command line tool. That default behavior is part of how people installing the 'aws-sdk' tool on a Saturday started immediately experiencing bad behavior due to the deliberate 'colors' npm package sabotage that happened that same Saturday.

In any event, it's certainly reasonable to debate pros and cons of different approaches. I'm mainly trying to clarify the actual behavior & differences.

What if the requirement was pinned specifically to 1.0.0 in order to avoid a bug introduced in 1.0.1. With a package that also requires a minimum 1.0.1, that should be unresolvable set of requirements and your package manager should fail to make a lockfile out of it.
how are you not describing package.json right now what is the difference
The difference is the npm ecosystem actively encourages automatically following SemVer because by default it uses a ^ to prefix the version number. https://heynode.com/tutorial/how-use-semantic-versioning-npm...

The practice of dependencies’ dependencies being specified using SemVer version constraints to auto-accept minor or patch changes is the difference compared to Go, and why lockfiles will not always save you in the npm ecosystem. That said, approaches like Yarn zero-install can make very explicit the versions installed because they are distributed with the source. Similarly, the default of using npm install is bad because it will update lockfiles, you have to use npm ci or npm install —ci both of which are less well-known.

So it’s not impossible to fix, just a bad choice of defaults for an ecosystem of packages that has security implications about the same as updating your Go (or JS) dependencies automatically and not checking the changes first as part of code review. Blindly following SemVer to update dependencies is bad, from a security perspective, regardless of why or how you’re doing it.

> The difference is the npm ecosystem actively encourages automatically following SemVer because by default it uses a ^ to prefix the version number.

So does Go. In fact, Go only supports the equivalent of ^, there is no way to specify a dependency as '=1.2.3'. That is, whenever you have two different dependencies which use the same dependency at different (semver compatible) versions, go mod will always download the newer of the two, effectively assuming that the one depending on an older version will also work with the newer.

The only difference in this respect compared to NPM (and perhaps also Cargo or NuGet? I don't know) is that Go will never download a version that is not explicitly specified in some go.mod file - which is indeed a much better policy.

It’s very subtle, but there are some important differences. For example, lockfiles are not recursive in NPM: the NPM package (usually?) does not contain the lockfile and does not adhere to it when installed as a dependency. It will pick the newest version of dependencies that matches the spec in package.json.

Go mod files are used recursively, and rather than try to pick the newest possible version, it will go with the oldest version.

This avoids the node-ipc issue entirely, at least until you update the go.mod.

This really depends on the specific package manager: if you're building an application in Rust, its lockfile will contain the full tree of dependencies, locked to a specific version.
I might be misunderstanding GP, but I think what they're saying is that when package A depends on package B, building package A will use B's lockfile. Assuming that's the case, I think this is generally not how Rust does things, as by default Cargo.lock is explicitly listed in the .gitignore when making a library crate, although there's nothing stopping anyone from just removing that line. I think I remember reading in documentation somewhere that checking in Cargo.lock for libraries is discouraged (hence the policy), but I don't recall exactly where since it's been so long. (That being said, there's a pretty decent chance you were the one who wrote that documentation, so maybe you might remember!)
> I think this is generally not how Rust does things

That is correct.

> as by default Cargo.lock is explicitly listed in the .gitignore when making a library crate

Even if it is included in the contents of the package, Cargo will not use it for the purpose of resolution.

The "don't check it in" thing is related, but not because it will be used if it's included. It's because of the opposite; that way new people who download your package to hack on it will get their own, possibly different Cargo.lock, so you end up testing more versions naturally. Some people dislike this recommendation and include theirs in the package, but that never affects resolution behavior.

Ah, good to know! I'm glad I asked
Yes, this is true; I think the article is definitely making the distinction vs NPM and not Rust.
The article seemed to go out of its way not to mention any specific package manager or ecosystem. So I think comparing to Rust is completely reasonable.
It didn’t mention one by name, but Rust hasn’t been subject to any widely publicized supply chain attacks. They do, however, mention left-pad by name. I think it can be implied that they really did just mean npm.
They definitely should have said that then. Simply referring to "lockfiles" paints with a very broad brush that includes a number of package managers that don't have the problems that NPM does.
This is also true of npm and yarn, as far as I can tell: package-lock.json and yarn.lock contain the exact version of every transitive dependency.
I always forget the exact semantics, but the parent's description of them as "recursive" is not the same as Cargo; Cargo determines the full tree and writes out its own lockfile, if dependencies happen to have a Cargo.lock inside the package, it's ignored, not used.
How does go deal with the diamond dependency issue when transitively you've a dep on the same package via two different paths?
As far as I understand it... the import path that you use to import a package acts as its identity, and only one version of any given package will be installed. The way that it will determine this is by choosing the lowest version specified in any package that depends on a given package. Major versions of packages are required to have different import paths with Go modules, so when depending on two different major versions of the same package, they are treated effectively as their own package.
I think you might have a small typo here:

> by choosing the lowest version specified in any package that depends on a given package

It picks the highest version specified in any of the requirements. (That's the minimal version that simultaneously satisfies each individual requirement, where each individual requirement is saying "I require vX.Y.Z or higher". So if A requires Foo v1.2.3 and B requires Foo v1.2.4, v1.2.4 is selected as the minimal version that satisfies both A and B, and that's true even if v1.2.5 exists).

Yep, I regrettably described the system wrong. It’s the highest version specified, and thus minimum version possible.
That's a nice & concise way to describe it.
Okay but if I depend on A and A depends on C and has it pinned at 1.5, but I also depend on B and B has C pinned on 1.8, then I get 1.5? But what happens if C is on 1.8 because it doesn't work with 1.5 because an API it needs doesn't exist in 1.5?

Are we not talking about the transitively pinned dependencies in the "lock" section, or are we talking about logical constraints?

Logical constraints would make more sense, but if the constraints on C across various transitive deps are > 1.5 and > 1.8, and 1.9 or 1.10 exists, I probably want the last version of those.

> Okay but if I depend on A and A depends on C and has it pinned at 1.5, but I also depend on B and B has C pinned on 1.8, then I get 1.5?

No, you end up with the highest explicitly required version. So 1.8 in that scenario, if I followed. (Requiring 1.5 is declaring support for "1.5 or higher". Requiring 1.8 is declaring support for "1.8 or higher". 1.8 satisfies both of those requirements).

> if the constraints on C across various transitive deps are > 1.5 and > 1.8, and 1.9 or 1.10 exists, I probably want the last version of those.

By default, you get 1.8 (for reasons outlined upthread and in the blog post & related links), but you have the option of getting the latest version of C at any time of your choosing (e.g., 'go get C@latest', or 'go get -u ./...' to get latest versions of all dependencies, and so on).

Also, you are using the word "pin". The way it works is that the top-level module in a build has the option to force a particular version of any direct or indirect dependency, but intermediate modules in a build cannot. So as the author of the top-level module, you could force a version of C if you needed to, but for example your dependency B cannot "pin" C in your build.

I'm fairly sure Go Modules does not support what you’re describing. It specifically avoided having a SAT solver (or something similar), unlike most package managers. You specify a minimum version, and that’s it. 1.8 would be selected because it is the highest minimum version out of the options 1.5 and 1.8 that the dependencies require. Unless you edit your go.mod file to require an even higher version, which is an option. Alternatively, you can always replace that transitive dependency with your own fork that fixes the problems, using a “replace” directive in your go.mod file.

If your dependencies are as broken as you’re describing, you’re in for a world of hurt no matter the solution. I also can't remember ever encountering that situation.

Well after 12 years of using ruby and bundler and maintaining a bundler-ish depsolver at work, and playing around a bit with cargo, I can say that it is becoming clearer as to why I don't grok go modules at all.

The lack of a depsolver is a curious choice...

I don't think my example was remotely "broken" at all, that's just another day doing software development.

If I'm understanding you properly, recursive lockfiles means that if I depend on some chain of dependencies A->B->C->D->E, and E has a security vulnerability that they patch in a new version, I have to wait for A B C D and E to all update their lockfiles before the security vulnerability will be patched on my system?
That's not correct. You can unilaterally decide to update the version of E without waiting for anyone. Alternatively, if only C for example decides to update their required version of E, you would get that version of E if you updated your version of C (directly or indirectly), without needing to directly do anything with E yourself.

Slightly longer explanation of the mechanics here: https://news.ycombinator.com/item?id=30871730

The best complete explanation is probably here: https://research.swtch.com/vgo-principles

But wouldn't that have the same issue then? Developers decide to update their dependencies to patch any security vulnerabilities, and wind up adding installing node-ipc's malicious update
The difference is that it's an explicit choice instead of other package managers who'd happily install latest compromised versions of packages by default.
But if its a manual explicit choice, that adds more friction to these patches, and many developers may not update them at all. It's a trade off
Perhaps

"A module may have a text file named go.sum in its root directory, alongside its go.mod file. The go.sum file contains cryptographic hashes of the module’s direct and indirect dependencies."

And

"If the go.sum file is not present, or if it doesn’t contain a hash for the downloaded file, the go command may verify the hash using the checksum database, a global source of hashes for publicly available modules."

Should be stressed on. If I committed a dependency version (go.mod) and checksum (go.sum) along with the code, either I get a repeatable build everywhere, or build fails if dependency not found or found to be modified.

I am not sure if all other package managers include checksum with dependency version.

> the go command may verify the hash

If we're talking about reproducible builds, the word "may" seems concerning here?

I suspect the primary purpose of the word "may" in that sentence is that you can choose to disable checking the hash against the Certificate Transparency style https://sum.golang.org. In other words, you can opt out. If you do, you fall back to your local go.sum file, which is more-or-less a "TOFU" security model: https://en.wikipedia.org/wiki/Trust_on_first_use

More on sum.golang.org: https://go.googlesource.com/proposal/+/master/design/25530-s...

Thank you for the clarification!
Discussing "what is a lockfile" is a bit of a headache because different languages have different files which do different things. Generally speaking, there's some file which specifies the dependency versions and some file with cryptographic checksums of the all transitive dependencies.

In Go it's go.mod / go.sum. In NPM, it's package.json / package-lock.json. In Rust it's Cargo.toml / Cargo.lock.

Diving into the exact details of what the author is saying is a bit outside my headspace at the moment. I think the author of the article may not actually understand the scenario where Go's package system differs. (I'm not sure I do, either.)

Suppose you have your project, projectA, and its direct dependency, libB. Then libB has a dependency on libC.

If projectA has a lockfile, you get exactly the same versions of libA and libB. This is true for Go, NPM, and Cargo. However, suppose projectA is a new project. You just created it. In Go, the version of libB that makes it into the lockfile will be the minimum version that libA requires, which means that any new, poisoned version of libB will not transitively affect anything that depends on libA, such as projectA. With NPM, you get the latest version of libB which is compatible with libA--this version may be poisoned.

> any new, poisoned version of libB

Conversely, you will get any old security-buggy version of libB instead.

Most package managers when adding a new dependency assume newer versions are "better" than older versions. Go's minimum version system assumes older is better than newer.

I don't think there's any clear argument you can make on first principles for which of those is actually the case. You'd probably have to do an empirical analysis of how often mailicious packages get published versus how often security bug fix versions get published. If the former is more common than the latter, then min version is likely a net positive for security. If the latter is more common than the former, then max version is probably better. You'd probably also have to evaluate the relative harm of malicious versions versus unintended security bugs.

> I don't think there's any clear argument you can make on first principles for which of those is actually the case.

I don't understand why someone would try to argue from first principles here, it just seems like such a bizarre approach.

Anyway, it's not just a security issue. Malicious packages and security fixes are only part of the picture. Other issues:

- Despite a team's promise to use semantic versioning, point releases & "bugfix" releases will break downstream users

- Other systems for determining the versions to use are much more unpredictable and hard to understand than estimated (look at Dart and Cargo)

https://github.com/dart-lang/pub/blob/master/doc/solver.md

https://github.com/rust-lang/cargo/blob/1ef1e0a12723ce9548d7...

> Other systems for determining the versions to use are much more unpredictable and hard to understand than estimated (look at Dart and Cargo)

I'm one of the co-authors of Dart's package manager. :)

Yes, it is complex. Code reuse is hard and there's no silver bullet.

Nice! I hope I wasn't coming across as critical of Dart's package manager, or Cargo for that matter.
It's OK. There are always valid criticisms of all possible package managers. It's just a hard area with gnarly trade-offs.
Every change that fixes a security issue implies the existence of a change that introduced the security issue in the first place. Why is bumping a version more likely to remove security issues instead of introduce them?

The reason why older is better than newer has more to do with the fact that the author has actually tested their software with that specific version, and so there's more of a chance that it actually works as they intended.

Security issues aren't introduced intentionally, oftentimes they are found much later on in code that was assumed to be secure. Like the SSL heartbleed vulnerability. Once a vulnerability like that is discovered, you _want_ every developer to update their deps to the most secure version
My statement had nothing to do with intent. Conversely, once a vulnerability is introduced (intentionally or not), you don't want every developer to update their deps to the newly insecure version.
Exactly, so it's a trade-off, do you want to encourage updates at the risk of malicious updates (like with node-ipc). Or do you want to add friction to updates and thus risk security vulnerabilities persisting for longer. Node chooses one approach, Go chooses the other.
I'd back that down to "most security issues aren't introduced intentionally".
Go just expects you to manually trigger the updates. Thats all. It still is in favor of updating to take security fixes, so i think your argument is wrong.
Let's say my_app uses package foo which uses package bar.

It turns out there is a security bug in bar. The bar maintainers release a patch version that fixes it.

In most package managers, users of my_app can and will get that fix with no work on the part of the author of foo. I'm not very familiar with Go's approach but I thought that unless foo's author puts out a version of foo that bumps its minimum version dependency on bar, my_app won't get the fix. Version solving will continue to select the old buggy version of bar because that's the minimum version the current version of foo permits.

> I thought that unless foo's author puts out a version of foo that bumps its minimum version dependency on bar, my_app won't get the fix. Version solving will continue to select the old buggy version of bar because that's the minimum version the current version of foo permits.

That is incorrect. The application's go.mod defines all dependencies, even indirect ones. Raise the version there, and you raise it for all dependencies. You cannot have one more than one minor version of a dependency in the dependency graph.

That still implies that I need to know to update the version constraint of what may be a very deep transitive dependency, doesn't it?
As I understand it, that's true in the simple case. If you have `my_app -> foo -> bar` then there's only one path to bar, and you only get a new bar when you upgrade foo.

It's more complicated in general, with diamond dependencies. There needs to be a chain of module updates between you and foo, with the minimum case being a chain of length one where you specify the version of foo directly.

So, people do need to pay attention to security patch announcements. But popular modules, at least, are likely to be get updated relatively quickly, because only one side of a diamond dependency needs to notice and do a release.

> As I understand it, that's true in the simple case. If you have `my_app -> foo -> bar` then there's only one path to bar, and you only get a new bar when you upgrade foo.

This is not correct. You can update bar independent of foo directly from the top-level go.mod file in your project.

You keep harping on old buggy version where Go has been very clear that it is operator's explicit responsibility to have correct/updated/fixed versions of dependency running.

It specially does not look good in your case considering you work for Google on a different programing language. If you have a clear point to make then compare it with your approach. Instead of making neutral sounding arguments when they are not.

Don't drag in ad-hominem attacks. If you want to defend Go's approach, explain why having it be the "operator's explicit responsibility" is a good policy, likely to make apps (in general) more secure. The obvious implication of the example given is that, on average, it will be a mess.
The code itself isn’t the only risk factor; it’s weighted by others, like discoverability, which are asymmetric in time. If the white hats fix an issue in version n+1, they’re going to make sure people know. If black hats (or normal devs making a mistake) introduce an issue, no one will tell you about it.

I.e. even if both strategies win just as often, min-version pulls ahead by taking less of a hit from losses.

The author is saying that Go provides the same guarantees with just a package list in the go.mod file that other package managers need both a package list and lock file to solve.

go.sum is essentially a distributed / community maintained transparency log of published versions of packages.

Maybe I'm just not familiar with it enough but I don't see how merging a package manifest and lockfile into a single file is a net win.

This means it's no longer clear which dependencies are immediate and which are transitive. It's not clear which versions are user-authored constraints versus system-authored version selections. For dependencies that are transitive, it's not clear why the dependency is in there and which versions of which other dependencies require it.

Other packages separate these into two files because they are very different sets of information. Maybe Go's minimum version selection makes that not the case, but it still seems user-unfriendly to me to lump immediate and transitive dependencies together.

FWIW, there are two machine-formatted sections of go.mod -- the first for direct dependencies, the second section for indirect dependencies.

(That's as of Go 1.17. Previously, that information was communicated via machine-generated comments in a single section).

That seems reasonable.

I think I personally lean towards keeping them in separate files entirely because I like a clearer separation between human-authored content and machine-derived state.

but if you ever bump up the version of an indirect dependency (maybe to pick up a bugfix earlier), is this now a direct or indirect dependency?
In other systems, you do that by creating a direct dependency. Even though your code doesn't directly import the module, you author an explicit dependency because you care about the version of the module that your app ends up using.

This way, there's a clear separation between human-authored intentional dependencies and the automatically-derived solved versions and transitive dependencies based on that.

If you see a diff in your package manifest, you know a human did that. If you see a diff in the lockfile, you know that was derived from your manifest and the manifests you depend on. The only human input is when they choose to regenerate the lockfile and which packages they request to be unlocked and upgraded. That's still important data because regenerating the lockfile at different points in time will produce different results, but it's a different kind of data and I think it's helpful to keep it separated.

Direct means that your code actually imports it directly instead of transitively. It has nothing to do with the version of that dependency being selected in the go.mod file.
Npm lockfile will refer to same package because you can't republish npm with the same tag (there is also content hash in lockfile).

Referring to version tag in git from go's mod can't guarantee that because you can overwrite tag in git.

Am I wrong?

Yes. The go.sum file that sits alongside go.mod keeps track of the hashes so that no modification like that can be made, and dependency fetches actually transparently go through a module proxy/mirror that keeps those same hashes as well, and it will prevent you from getting an altered version of a known module even if you’re starting a new project and don’t have a sum file yet. Versions can’t be republished.
Thanks for clarification, indeed I can see go.sum being checked in on few go package repos I've checked, nice.