That's a different kind of bootstrapping though. The OP is about building up to a toolchain from almost nothing using only source code and only relying on some tiny binaries that could be hand-assembled or verified. And it also uses a version of TinyCC along the way before it arrives to gcc.
I do plan to integrate this, or something very similar, into live-bootstrap. This is essentially the type of thing we would do to compile the Linux kernel within the bootstrap, which has not been done yet. However tccboot/derivative needs to be compiled from source within live-bootstrap which is not simple.
One nice thing about the work showcased here is that it bootstraps TCC on its journey to GCC. At some point we could see attempts to bootstrap tccboot as a kind of "escape pod" from other systems, and after booting tccboot, have the bootstrapping work documented here continue from TCC onward.
While we can definitely discuss whether it's practical for anyone to actually audit all that source code (no it is not), proving a 356 bytes codestream isn't malicious seems like a good foundation to argue about.
I wonder if Brainfuck could be used for https://github.com/oriansj/stage0-posix ? It would not surprise me if there is no other language for which there are so many interpreters written in so many different programming languages. It is even possible to write a Brainfuck interpreter in Brainfuck, which can be verified. And there is also a Brainfuck interpreter written in x86-64: https://github.com/316k/brainfuck-x86-64 . It is a little larger than hex0_x86.hex0 , but not too much to make it hard to verify.
The dataflow is not totally clear to me—am I understanding correctly that the parser generator is implemented by a recursive-descent parser implemented in 1400 lines of JS, which then compiles the example grammar in "grammar" into the top-down Forthish code in "grammar_error", which is then used to compile the Algolish code in "input" into the Lispish code in "output", which is then compiled into the BF in "result"?
While this is an inspiring and aesthetically delightful achievement, it's not clear that it helps with the hex0 seeds, because executing the 1400-line recursive-descent parser requires the seed to grow by the size of one JavaScript execution engine; all the existing ones are about four orders of magnitude larger than the live-bootstrap seed. Lacking that, a malicious JS execution engine could inject malicious code into the BF output, in the Karger–Thompson-attack scenario.
I suspect that the initial grammar compilation could be achieved by a cut-down compiler-compiler along the lines of what I'm working on in http://canonical.org/~kragen/sw/dev3/meta5ix.m5, although Meta5ix itself is not currently bootstrappable from a human-written seed.
My idea was not to use the input as the base of the seed, but the generated BrainFuck program. The idea is that there are a multitude of BrainFuck interpreters/compilers in a multitude of programming languages that one could use verify that they are not malicious.
Also, I think it is easier to verify that a BrainFuck program is not malicious than a program in machine code. The x86 instruction set is rather large, I understand, with all its extensions. I understand that only a very restricted subset of instructions is used in the seed, but one still needs to verify this requiring to understand the x86 instruction documentation. Verifying the BrainFuck program is not easy, but it does not require the access to some external document. Besides the definition of the language, only an ASCII table, which would also be needed for x86 seed, BTW.
The performance argument is not a real issue if it is only used as a replacement for sed/xxd combination. And yes, it was designed to make programming difficult, but one could argue that it is easier to verify manually than machine code, because machine code requires you to know the whole x86 instruction set, which I understand, is rather large, taking into account all the extensions. I suspect that these exentions are not used in the stage0 code, but we still have to verify that they are not used.
The kernel is still an unsolved problem, unfortunately. We need an appropriate, small seed kernel to be able to successfully run everything up until we can recompile linux (which also has proved difficult and is not why it is in the repo yet) :(
I believe this is this is the dependency chain your live-bootstrap works through: https://github.com/oriansj/talk-notes/raw/master/live-bootst...