Hacker News new | ask | show | jobs
by onedognight 970 days ago
Rebuilding the minimal ISO from source is an impressive milestone on the journey to a system that builds from source reproducibly. Guix had an orthogonal but equally impressive milestone on the same journey recently[0], bootstrapping a full compiler toolchain from a single reproducible 357 byte binary without any other binary compiler blobs. These two features may one day soon be combined to reproducibly build a full distribution from source.

[0] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

7 comments

That is amazing and it is great to see there are people out there fighting the good fight (while others ask: "but where's the benefit!? if there's a backdoor, everybody is still going to get the backdoor!").

> it gives us a reliable way to verify the binaries we ship are faithful to their sources

That's the thing many don't understand: it's not about proving that the result is 100% trustable. It's about proving it's 100% faithful to the source. Which means that should monkey business be detected (like a sneaky backdoor), it can be recreated deterministically 100% of the time.

In other words for the bad guys: nowhere to run, nowhere to hide.

To me, the largest benefit isn't even related to "bad guys", but rather in being able to understand and debug issues.

Reproducibility makes bugs more shallow. If hydra builds a bit-for-bit identical iso to what you build locally, that means a developer can make a change to the iso inputs, test it, and know that testing will also apply to the final ci-built one.

If a user reports a bug in the iso, and you want to test if a change fixes it locally, you can start from an identical source-code commit as the iso was built from, make some minimal changes, and debug from there, all without worrying that you're accidentally introducing unintended differences.

It minimizes "but it works on my machine" type issues.

Super clarifying, thank you.
It's not yet as far as the Guix stage0, but there was an interesting talk about bootstrapping nix from TinyCC at NixCon: https://media.ccc.de/v/nixcon-2023-34402-bootstrapping-nix-a...
357 bytes for bootstrap compiler binary is VERY impressive!
If I remember correctly, this tiny binary is used to (reproducibly) bootstrap the next binary, which bootstraps the next binary, until eventually GCC can be compiled (and compile other software).
To be fair, it is 357 bytes ... plus a POSIX operating system.

Still, that POSIX operating system bit is also being worked on.

Isn't that what builder-hex0 does?

https://github.com/ironmeld/builder-hex0

At 357 bytes, do you need a reproducible binary at all?

I'd think one could hand-document all 357 bytes of machine code and have them be intelligible.

This[0] is basically the hand-documentation of those bytes then. Handwritten ELF header and assembly code.

[0] https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX...

Just had a read of this to see what it did... And I must admit, I don't understand what purpose this is supposed to serve.

All it seems to do is convert hex into binary and dump it to a file. Not sure how that's any more useful than just copying the binary for next stage directly, after all this binary had to get on the system somehow.

The program does also dispose of comment lines.

One could argue that this is just a kind of trick so they can say the next "binary" is actually a "source" file because it happens to be written by a human in ASCII.

Still the phase distinction between what is a source and what is a binary becomes blurry at this low level. I believe the next stage of compiling is to, writing in ASCII represented machine code with comments, to allow for the existence of labels and then compute offsets for jumps to labels. And then more and more features are added until you have a minimal assembler letting you write somewhat machine independent code, and then continuing to work you way up the toolchain.

So at which point does the translation from "source" to "binary" become a real thing and not just a trick of semantics? Is it when we have a machine independent assembly code? Is it when we computed offsets for labelled jumps? It is when we started stripping comments out of the source code?

Yeah, I kind of agree, but my issue is kind of with this statement (in the link from the peer post):

> What if we could bootstrap our entire system from only this one hex0 assembler binary seed? We would only ever need to inspect these 500 bytes of computer codes. Every later program is written in a more friendly programming language: Assembly, C, … Scheme.

And my issue is that this isn't true. hex1 isn't written in assembler any more than hex0 is. Both of those bootstrap files can get onto the system simply by ignoring whitespace and anything after #, converting the hex into binary and writing it to a file.

Having hex0 doesn't add anything to the mix, other than being shorter than hex1, because you still have the same initial bootstrap problem of how you can prove that the hex0 binary represents the hex in its source vs the hex1 binary and its source and both have the same problem of needing to prove the hex in the source matches the assembly (and that the program even does what the comments claim).

hex1 is a more useful bootstrap point, because you can use standard system tools to create the binary from the source (e.g. sed) and also compile itself and verify that the files are the same.

Having hex0 and hex1 just means you need to manually verify both rather than just hex1.

I guess my point is that if you have insufficient trust in your system that you can't e.g. trust "sed" to create the original binary files, or trust the output of "dd -x" or "md5sum" to verify the binary files, you also can't trust it enough to verify that the hex in those source files is correct or that the binary files match.

> Having hex0 doesn't add anything to the mix, other than being shorter than hex1, because you still have the same initial bootstrap problem of how you can prove that the hex0 binary represents the hex in its source vs the hex1 binary and its source

Well presumably you toggle hex0 in on the front panel and then type hex1 with the keyboard, which is easier than toggling in the binary of hex1.

It’s the first stage. Likely piped. Hence the hex out. The context on how it’s called is key: https://github.com/oriansj/stage0-posix-x86/blob/e86bf7d304b...
Section 1.6.1 of the GNU Mes manual places these early stages assemblers into context:

https://www.gnu.org/software/mes/manual/mes.html#Stage0

That’s just the first stage. Simple enough to be audited manually.
Or tattooed on oneself! Or etched on a dog tag!
Nix is a great dog name
I get cat vibes from "Nix".
> bootstrapping a full compiler toolchain from a single reproducible 357 byte binary without any other binary compiler blobs.

wtf that is mind-boggling. Thanks for the link.

Classic HN, the top comment in a Nix post is about Guix.

Nix has more packages and advocacy (even if the vast majority of people exposed to nix/guix will never actually use it), but Guix is a lot more interesting to me with the expressive power of scheme on offer.

That said, there are some sharp edges[0] that seem a bit harder to figure out (is this just as inscrutable/difficult as nix?).

Does anyone have some good links with people hacking/working with guix? Maybe some blogs to follow?

I care more about the server use-case and I'm a bit worried about the choice of shepherd over something more widely used like systemd and some of the other libre choices which make Guix what it is. Guix is fine doing what it does, but it seems rather hard to run a Guix-managed system with just a few non-libre parts, which is a non-starter.

Also, as mentioned elsewhere in this thread, the lack-of-package-signing-releases is kind of a problem for me. Being source and binary compatible is awesome, but I just don't have time to follow the source of every single dependency... At some point you have to trust people (and honestly organizations) -- the infrastructure for that is signatures.

Would love to hear of people using Guix in a production environment, but until then it seems like stuff like Fedora CoreOS/Flatcar Linux + going for reproducible containers & binaries is what makes the most sense for the average devops practitioner.

CoreOS/Flatcar are already "cutting edge" as far as I'm concerned for most ops teams (and arguably YAGNI), and it seems like Nix/Guix are just even farther afield.

[EDIT] Nevermind, Guix has a fix for the signature problem, called authorizations![1]

[0]: https://unix.stackexchange.com/questions/698811/in-guix-how-...

[1]: https://guix.gnu.org/manual/devel/en/html_node/Specifying-Ch...

> Does anyone have some good links with people hacking/working with guix?

We've just had a conference about Guix in HPC: https://youtu.be/dT5S72x18R8

This is a recording of a stream for the second day with talks about large scale deployments of Guix System in HPC.

Thanks for this! Going to give it a watch :)
How long does a fully bootstrapped build take?
It obviously depends on the hardware, but IIRC for me maybe 3-4 hours building from the 357 byte seed to the latest GCC.

The early binaries are not very optimized :-)

With caching, just the time to download the artefact.
Doesn't caching completely defeat the point of bootstrapping? How do you know the cached artifact is correct? You have to build it manually to verify that, at which point you're still building manually...
Guix has tooling to verify binaries:

https://guix.gnu.org/en/manual/en/html_node/Invoking-guix-ch...

"guix build --no-grafts --no-substitutes --check foo" will force a local rebuild of package foo and fail if the result is not bit-identical. "guix challenge" will compare your local binaries against multiple cache servers.

I build everything locally and compare my results with the official substitute servers from time to time.

1. hash it

2. rebuild it without the cache

3. hash that

4. compare

Or, trust somebody who has. Inconvenient, but is there any other way to establish trust in the correspondence between code and a binary?

You have a hash that n trusted parties agree on. This is enabled by reproducible builds.