Hacker News new | ask | show | jobs
by rsc 1240 days ago
Thanks for the quick rollback.

I want to encourage you to think about locking in the current archive details, at least for archives that have already been served. Verifying that downloaded archives have the expected checksum is a critical best practice for software supply chain security. Training people to ignore checksum changes is training them to ignore attacks.

GitHub is a strong leader in other parts of supply chain security, and it can lead here too. Once GitHub has served an archive with a given checksum, it should guarantee that the archive has that checksum forever.

3 comments

I've just had a thought. When GitHub do update the hashing for better compression, everyone relying on the tar hash will update their hashes. This is the ultimate opportunity to change the tar contents, effect the supply chain, introduce vulnerabilities, and have everyone trust you. Something like Nix which computes the NAR Hash (the result of the tar contents) will not be effected by this, since it only cares about the content. I think this is much better than worrying about an unlikely tar vulnerability. In a system that only trusts the tar hashes, the original source is not able to take advantage of better compression over time, without massive risk of supply chain attack. If you think you can hand me a tarball that can run arbitrary code, for any version of tar that has ever existed, please give it to me so I can experiment with exploits, and I'll buy you a drink of your choice at FOSDEM if you're there!
You're not wrong, but you're also not being realistic.

Nix is not the only system that takes this approach. The Go modules "directory hash" is roughly equivalent, although we defined it in terms of somewhat more standard tooling: it is the output of

    sha256sum $(find . -type f | sort) | sha256sum
I am not here advocating that everyone switch to this basic directory hash either, because it's not a solution to the more general problem that many systems are solving, namely validating _any_ downloaded file, not just file archives.

There are widespread, standard tools to run a SHA256 over a downloaded file, and those tools work on _any_ downloaded file. Essentially every programming language ships with or has easily accessible libraries to do the same. In contrast, there are not widespread, standard tools or libraries for the "NAR Hash" nor the Go "directory hash". Even if there were, such tools would need to be able to parse every kind of file that people might be downloading as part of a build, not just tar files.

It's a good solution in limited cases such as Nix and Go modules, but it's not the right end-to-end solution for all cases.

When you say it is not the right end-to-end solution for all cases, I am wondering what case you have in mind that a NAR Hash would not be suitable for.

If you adopt Nix fully, the .narinfo file that cache.nixos.org (a Nix substituted) serves that is signed, contains both the NAR Hash and the hash of the NAR Archive File as well. Additionally, NAR packs and unpacks deterministically, and you can read the implementation in the Nix thesis.

A .narinfo file looks like this:

```

StorePath: /nix/store/xvp2wr01fi27j0ycxqmdg6q4frsiv82s-libnotify-0.8.1 URL: nar/0a4jjqxwjcnnaia76l64drq9bjw7jczgmrirzshgp0bnw621f1c9.nar.xz Compression: xz FileHash: sha256:0a4jjqxwjcnnaia76l64drq9bjw7jczgmrirzshgp0bnw621f1c9 FileSize: 24324 NarHash: sha256:02bh3qjxgph5g9di3q553k87w4kbc4drmflkfz9knqbp9jip98c5 NarSize: 101776 References: 7ncncvnr864iangwbvbgbanx1r6wpf79-gdk-pixbuf-2.42.10 i4dqcpppyyq5yqcvw95mv5s11yfyy8pf-glib-2.74.3 xvp2wr01fi27j0ycxqmdg6q4frsiv82s-libnotify-0.8.1 yzjgl0h6a3qh1mby405428f16xww37h0-glibc-2.35-224 Deriver: 2vjs6q5j5vqckcwsvmh5lajvx3p7arkj-libnotify-0.8.1.drv Sig: cache.nixos.org-1:IqCAJROaqNx4TthRv9V47/dM7KP4sR+bBWBfL+9xSqQHAezcfczYdJhKj8nl5l+iFnj8O4uTIJMWNOcwVq8+AA==

```

> If you adopt Nix fully, ...

The case where Nix is not adopted fully is the one I have in mind.

This is the only case then?
My point is about (1) the broader ecosystem of tools that may need to interoperate and have easy access to "SHA256 the whole file" and (2) the fact that not everything is a tar file that the Nix tools can process. So yes, that's the "only" case.
I would also appreciate stronger advertising of the ability to turn a Git tag into a GitHub release and upload stable source code files to it. Maybe even a button in the GitHub releases interface to “generate source tarball and attach as stable tarball to this release.”
But this isn’t a great solution, because afterwards there is now three, or four source download links, some of which are stable.

Not to mention, forcing people to use GitHub releases instead of just tags (which excludes every mirror of somewhere else)

I agree this would be great. However, it should also stop you from providing useless tarballs (as `/archive/` does today) if:

- you use autoconf (or any other tool(s) that require generating code into the source archive; or - you have submodules (to which `git archive` is completely blind).

Note that `git-archive-all`[1] can help as long as your submodules don't do things like `[attr]custom-attr` in their `.gitattributes` as it is only allowed in the top-level `.gitattributes` file and cannot be added to the tree otherwise.

[1]https://github.com/roehling/git-archive-all

Yeah, it would be nice if you could disable the generated archive links for releases or at least de-emphasize them.