Hacker News new | ask | show | jobs
by userbinator 3734 days ago
Yes, this is C4-style stack-based code generation followed by postprocessing into ARM instructions.

Everything we did to it to make it work more realistically made me feel like we were ruining it and missing its point.

On the contrary, I think the parser (and tokeniser) is the most interesting part, as the codegen in C4 was basically "for free" since it generates as it parses. The parser is amazingly simple yet featureful for its size, and that's what makes it a good starting point. It could easily be made to generate AST nodes instead of stack instructions, and then you have the beginnings of a "real" compiler. The simplicity makes it easy to start "hacking on" and extend/modify, because it's straightforward to understand where everything is.

The one idea I have for the parser is to refactor it into being table-driven indexed on the precedence levels, instead of the large switch with lots of very similar code in each case. At a glance it looks like this version can compile itself, and already supports structures, arrays of structures, and maybe function pointers.

In any case it's great to see more little compilers that are so close to "real" ones in functionality.

Edit: yes, it does self-compile. From the Makefile:

    ./amacc amacc.c tests/hello.c
1 comments

It's postprocessing to ARM, but it's using ARM to implement a stack-based runtime, right?

Just to be a little clearer about my concern about the C4 codebase:

C4 isn't just written in its idiosyncratic style in order to be smaller; it's also designed to compile the minimal subset of C required to self-host. For instance, global variables and, in particular, global arrays are there because C4 didn't parse structs.

This compiler is inheriting those design decisions, but has discarded the goal of compiling a minimal subset of C, so its design is a little incoherent. (Ours was too!)

C4 is more like a piece of sculpture than it is a real compiler.