Hacker News new | ask | show | jobs
by dijit 880 days ago
I didnt need to use bazel, I like bazel and want to learn more about it.

I also have a small, but burning, passion for reproducible builds, distributed compilation and distributed caching.

Being able to build an entire OS and essentially anything I want on top in a reproducible and relatively organic way (with incremental compilation) is pretty dope.

2 comments

You sound like the perfect Nix cult memb… erm, user. It’s everything you describe and more (plus the language is incredibly powerful compared with starlark).

But you speak from sufficient experience that I presume Nix is a “been there, done that” thing for you. What gives?

Nix isn't as fine-grained as Bazel as I understand it? I don't think it's incremental within a package, which is presumably what dijit achieved.
Weirdly enough I came across a blog post last week that talked about exactly this. https://j.phd/nix-needs-a-native-build-system/

Nix can be used as a build system in the same way that bazel can. It already has all of the tooling - a fundamental representation of a hermetic DAG, caching, access to any tool you need, and a vast selection of libraries.

The only catch is that no one has used it to write a build system for it in public yet. I’ve seen it done in a couple of companies, though, as using Nix to only partially manage builds can be awkward due to caching loss (if your unit of source is the entire source tree, a tiny change is an entirely new source).

Nix can do it incremental U could split it into multiple derivations which get built into one package For rust there ist the excellent https://crane.dev/index.html project

Or you can also go to the extreme and do 1:1 source to derivation mapping So for example if ur project has 100 source files it could be built from 100 derivations, the language/CLI tools are flexible enough for that

https://discourse.nixos.org/t/distributed-nix-build-split-la... https://discourse.nixos.org/t/per-file-derivations-with-c/19...

Don't know tho if there any well working smart nix tools which can make it well working /efficient, in theory it's very possible, just unsure about practicality/overheads

Nix is basically merely a quirky functional programming language that generates shell scripts to be run in a sandbox for the actual build. It is not a great tool for within-a-project building; its minimal unit of work has a pretty high overhead.
Nix has decentralized caching and memorizing?
Decentralised caching, absolutely - unless I’m misunderstanding what you mean there. You can build across many machines, merge stores, host caches online with cachix (or your own approach), etc. I make fairly heavy use of that, otherwise my CI builds would be brutal.

Memorizing isn’t a term I’m familiar with in this context.

Sorry - memoizing.

I am interested in making a system that can memoize large databases from ETL systems and then serve that on iroh or ipfs/torrent, such that a process that may take a supercomputer a week to process can have the same code run on a laptop and it will notice it's been done my a university supercomputer before already and grab that result automatically from the decentralized network of all people using the software (who downloaded the ETL database).

That way you save compute and time.

Oh I see!

Yes, absolutely doable in Nix.

Derivations are just a set of instructions combined with a set of inputs, and a unique hash is made from that.

If you make a derivation whose result is the invocation of another, and you try and grab the outcome from that derivation, here’s what will happen: - it will generate the hash - it will look that hash up in your local /nix/store - if not found it will look that hash up in any remote caches you have configured - if not found it will create it using the inputs and instructions

This is transitive so any missing inputs will also be searched for and built if missing, etc.

So if the outcome from your process is something you want to keep and make accessible to other machines, you can do that.

If the machines differ in architecture, the “inputs” might differ between machines (e.g. clang on Mac silicon is not the same as clang on x86-64) and that would result in a different final hash, thus one computation per unique architecture.

This is ultimately the correct behaviour as guaranteeing identical output on different architectures is somewhat unrealistic.

I see. Perhaps the added benefit I am trying to create with this other system is that specifying remote locations isn't necessary, and is just inherited as the distributed network. Anytime anyone runs it, they're added to the network, so it scales with the number of users.
You should check out the ChromeOS Bazelification project[1]. It has those exact same goals. Not all packages are reproducible though because they embed timestamps.

[1]: https://chromium.googlesource.com/chromiumos/bazel/+/HEAD/do...