Hacker News new | ask | show | jobs
by the_duke 2073 days ago
Nix packages don't contain any source code.

The package definition describes how to fetch the source code from a source (like a Git repo or a hosted archive) and build it. The built result only contains what is necessary at runtime.

A sizeable amount of packages don't even fetch source code but a prebuilt binary which is then fixed up to work with Nix.

There is a source cache, but it is optional.

As an example, check out ripgrep [1]. It uses `fetchFromGithub` to retrieve the code.

[1] https://github.com/NixOS/nixpkgs/blob/3bb54189b0c8132752fff3...

2 comments

Note that the source code is referenced by a hash, so it can't change without changing the package. Also, the source code of all packages built by Hydra is on cache.nixos.org alongside the resulting binaries.
True, which is absolutely sufficient for most use cases.

I'm currently doing some work for ML and data science companies where full reproducibility and introspection is very much desired.

So you need to run your own source cache to provide that guarantee, because you can't count on cache.nixos.org still providing the source code from a package built 4 years ago.

But that's why I love the IPFS cache efforts. [1] Running your own node to pin all required sources should then be relatively easy.

[1] https://blog.ipfs.io/2020-09-08-nix-ipfs-milestone-1/

Software Heritage is also helpful here, and Guix is integrating with it - see e.g. https://guix.gnu.org/blog/2019/connecting-reproducible-deplo...
Nix is also integrating with the Software Heritage:

https://www.tweag.io/blog/2020-06-18-software-heritage/

Debian's archive reaches back 15 years now: https://snapshot.debian.org/

It also contains source code.

Are those sources target independent or specific for each new build of the package? That is, is there a new source code package on hydra when one of the dependencies changes? Or does it only change if the package itself changes?

Also, they are only available when hydra builds the package anyways, right? So if some package is not built by hydra (like how it used to be for the texlive packages), it'll still download the sources from the various places they are hosted.

As for the hash, it's good that the source code is hashed, but my main concern was that it was downloading from external sources in the first place. This is bad for privacy, as those hosts know I'm downloading from them, as well as for reliability, because the hosts might not have as good uptime as a debian package mirror.

A sibling comment replied to your first paragraph, so just about the second two:

>Also, they are only available when hydra builds the package anyways, right? So if some package is not built by hydra (like how it used to be for the texlive packages), it'll still download the sources from the various places they are hosted.

Yes.

>As for the hash, it's good that the source code is hashed, but my main concern was that it was downloading from external sources in the first place. This is bad for privacy, as those hosts know I'm downloading from them, as well as for reliability, because the hosts might not have as good uptime as a debian package mirror.

That's a true and valid concern, but note that it's the same situation as with Debian: If the package is built upstream by Debian/the NixOS Hydra instance, then you have reliable, private access to its source code so you can rebuild it. If it's not built/packaged upstream, then you need to get the source from somewhere else.

The discrepancy is just that there's packages in Nixpkgs which are not built upstream, and which get built only locally on your machine or your own Hydra instance. There are not many of these, but yeah, it would be nice to fully get rid of them.

Or, an interesting option would be to build the source for more packages on Hydra, without actually building the binary for the package. That wouldn't be too hard, if someone adds an expression for doing it.

> That's a true and valid concern, but note that it's the same situation as with Debian

Good point!

> an interesting option would be to build the source for more packages on Hydra, without actually building the binary for the package. That wouldn't be too hard, if someone adds an expression for doing it.

Yes, that would be awesome!

Are those sources target independent or specific for each new build of the package? That is, is there a new source code package on hydra when one of the dependencies changes? Or does it only change if the package itself changes?

Fixed-output derivations are used for sources (they are content-addressed in the store), so the latter.

Binary packages make me sad too and I wish there were a way to mark them as such using a `meta` key.

It starts to become a bit of a grey area in some cases though. For instance - java packages. Is a .jar a binary? Probably. But so many java applications rely on pulling loads of .jars down from maven. Are we going to sit down and figure out how to build all those jars from source? It's not uncommon for there to be literally hundreds.

I find that binary packages are quite rare for free/open source software. So avoiding proprietary software (with, e.g., allowUnfree = false) gets most of the way there.

I agree that it would be nice to tag (with meta) FOSS packages that aren't built from source, though. Every instance of that is a bug, IMO...

> I find that binary packages are quite rare for free/open source software.

Unfortunately this isn’t the case with some languages that have their own package managers, the prime example being Java as the parent commenter mentioned. It’s near impossible to build Java applications without fetching tons of binary jars from maven that Debian just gives up on providing their own package in many cases[1]. While Nix does build Java applications from source, the dependencies are fetched from maven in binary form.

[1]: https://wiki.debian.org/Hadoop