Hacker News new | ask | show | jobs
by ludocode 665 days ago
It can be difficult to explain why bootstrapping is important. I put a "Why?" section in the README of my own bootstrapping compiler [0] for this reason.

Security is a big reason and it's one the bootstrappable team tend to focus on. In order to avoid the trusting trust problem and other attacks (like the recent xz backdoor), we need to be able to bootstrap everything from pure source code. They go as far as deleting all pre-generated files to ensure that they only rely on things that are hand-written and auditable. So bootstrapping Python for example is pretty complicated because the source contains code generated by Python scripts.

I'm much more interested in the cultural preservation aspect of it. We want to preserve contemporary media for future archaeologists, for example in the Arctic World Archive [1]. Unfortunately it's pointless if they have no way to decode it. So what do we do? We can preserve the specs, but we can't really expect them to implement x265 and everything else they would need from scratch. We can preserve binaries, but then they'd need to either get thousand-year-old hardware running or virtualize a thousand-year-old CPU. We can give them, say, a definition of a simple Lisp, and then give them code that runs on that, but then who's going to implement x265 in a basic Lisp? None of this is really practical.

That's why in my project I made a simple virtual machine, then bootstrapped C on top of it. It's trivially portable, not just to present-day architectures but to future and alien architectures as well. Any future archaeologist or alien civilization could implement the VM in a day, then run the C bootstrap on it, then compile ffmpeg or whatever and decode our media. There are no black boxes here: it's all debuggable, auditable, open, handwritten source code.

[0]: https://github.com/ludocode/onramp?tab=readme-ov-file#why-bo...

[1]: https://en.wikipedia.org/wiki/Arctic_World_Archive

2 comments

Yep, I think this would have been good context in the OP
Say you start with nothing but "pure source code".

With what tool do you process that source code?

The minimum tool that bootstrapping projects tend to start with is a hex monitor. That is, a simple-as-possible tool that converts hexadecimal bytes of input into raw bytes in memory, and then jumps to it.

You need some way of getting this hex tool in memory of course. On traditional computers this could be done on front panel switches, but of course modern computers don't have those anymore. You could also imagine it hand-woven into core rope memory for example, which could then be connected directly to the CPU at its boot address. There are many options here; getting the hex tool running is very platform-specific.

Once you have a hex tool, you can then use that to input the next stage, which is written in commented hexadecimal source code. The next tool then adds a few features, and so does the tool after that, and so on, eventually working your way up to assembly and C.

From the point of view of trust and security, bootstrapping has to be something that's easily repeatable by everyone, in a reasonable amount of time and steps, with the same results.

Not to mention using only the current versions of all the deliverables or at most one version back.