Hacker News new | ask | show | jobs
by eighthnate 3218 days ago
If people find getting started on a compiler to be a bit too intimidating, one good way to get your feet wet is implementing an interpreter for small subset of a language. Perhaps the basic arithmetic part of adding/multiplying/dividing integers.
4 comments

People finding compilers intimidating is exactly my audience. :)

http://www.t3x.org/t3x/book.html

Please excuse the shameless plug!

I have been reading the T3X book and code and I have to say that I am a huge fan of your work. I haven't finished reading it yet, but I have thoroughly enjoyed it so far! I can do successful 'make test' on Linux and NetBSD, but a trivial T3X program seems to misbehave on NetBSD. I look forward to getting to the bottom of it.

A quick question, if you don't mind: why do t.c and t.t differ slightly in the code emitted for t.memcmp, t.copy and t.fill? Thanks!!

Thanks! I'm glad you like my work!

The two compilers differ because I stopped applying non-essential modifications to t.c as soon as it was good enough to bootstrap t.t. There are also some edges cases that t.c does not catch. I think it's best to not use it for anything but bootstrapping!

Let me know about that misbehaving NetBSD program when you find out what caused it -- or even if you don't!

I own a number of the Nils Holm books on language implementation and they are all wonderful in a compact, approachable and real-world way that text books are not. A great companion to any programming language course.
I disagree. I tried that approach for many years but without external input, I could never figure out how to transition from a simple expression language to a proven, working compiler architecture. While large compiler architectures work well for smaller languages, the opposite is not true.

I have found it much better to pick a good introductory text and just work through the exercises.

You can turn an interpreter into a compiler by replacing all code that actually does something by code that prints out the code that does it in the target language. It takes a bit to wrap your head around it, and you won't get an optimizing compiler, but a compiler it will be.

So your values are no longer values in the interpreter's language, but descriptions in the target language for getting that value. To compile an expression, you first handle the sub-expressions, as in an interpreter, which prints the code to compute them. Additionally, you get such a value description for the return value of each expression. Then you can use those descriptions to print out code to get them into known locations (e.g. registers). Then you can print out code to perform your operation on the values in those locations and put the result in another location (e.g. on the stack). The return value of this compilation step is the description of that location.

I like your comment so please don't interpret my remark as evidence of the contrary. I simply think you misunderstood me.

I didn't mean to say that it's hard to transition from interpreters to compilers, only that I found it hard to transition from interpreting/compiling an arithmetic language to a full structured programming language with general recursion, loops, data structures, variables and so on.

Since I'm not sure how understandable the explanation was, here is an example, a simple interpreter for a Lisp-like language, using Python lists as the syntax tree.

    def interpret(state, function_table, expression):
        function = function_table[expression[0]]
        return function(state, function_table, *expression[1:])
    
    def interpret_all(state, function_table, expressions):
        return [interpret(state, function_table, e) for e in expressions]
    
    interpreter_table = {
        '+': lambda s, ft, *es: reduce(lambda a, b: a + b, interpret_all(s, ft, es)), # sum
        '*': lambda s, ft, *es: reduce(lambda a, b: a * b, interpret_all(s, ft, es)), # product
        '!': lambda s, ft, k, v: s.update({k: interpret(s, ft, v)}),                  # set variable
        '?': lambda s, ft, k: s[k],                                                   # get variable
        ';': lambda s, ft, *es: interpret_all(s, ft, es)[-1],                         # execute a sequence and get the last value
    }
An example program for 'x = a * (b + a), return x * x' looks like this:

    program = [';',
                ['!', 'x', ['*', ['?', 'a'], ['+', ['?', 'b'], ['?', 'a']]]],
                ['*', ['?', 'x'], ['?', 'x']]]
You can execute the interpreter with initial values for a and b like this:

    >>> interpret({'a': 10, 'b': 20}, interpreter_table, program)
    90000
The compiler can be implemented by swapping out the interpreter_table with different functions to print code instead of executing it:

    def fresh_variable(state):
        next_tmp = state.get('__next_tmp__', 0)
        var = 'tmp_%s' % next_tmp
        state['__next_tmp__'] = next_tmp+1
        return var
    
    def compile_sum(state, a, b):
        var = fresh_variable(state)
        print(var+' = '+a+' + '+b)
        return var
    
    def compile_product(state, a, b):
        var = fresh_variable(state)
        print(var+' = '+a+' * '+b)
        return var
    
    def compile_set(state, key, value):
        state[key] = key
        print(key+' = '+value)
    
    compiler_table = {
        '+': lambda s, ft, *es: reduce(lambda a, b: compile_sum    (s, a, b), interpret_all(s, ft, es)), # sum
        '*': lambda s, ft, *es: reduce(lambda a, b: compile_product(s, a, b), interpret_all(s, ft, es)), # product
        '!': lambda s, ft, k, v: compile_set(s, k, interpret(s, ft, v)),                                 # set variable
        '?': lambda s, ft, k: s[k],                                                                      # get variable
        ';': lambda s, ft, *es: interpret_all(s, ft, es)[-1],                                            # execute a sequence and get the last value
    }
Then you can easily compile the program:

    >>> interpret({'a': '10', 'b': '20'}, compiler_table, program)
    tmp_0 = 20 + 10
    tmp_1 = 10 * tmp_0
    x = tmp_1
    tmp_2 = x * x
    'tmp_2' # this is the return value, telling you where to find the result
Depending on the syntax of your target language, control flow might require additional compiler state. In Python, I would have to store the current indentation level. If someone is interested, I could show how to do if-statements.
Somewhat like this but taking it further: https://github.com/darius/toot
And ideally, your compiler will turn that interpreter into a compiler :)
Close parallels to partial evaluation and the Futamura projections. https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...
What do you use as an x86 code generator for such purposes ?