Hacker News new | ask | show | jobs
by kqr 3218 days ago
I disagree. I tried that approach for many years but without external input, I could never figure out how to transition from a simple expression language to a proven, working compiler architecture. While large compiler architectures work well for smaller languages, the opposite is not true.

I have found it much better to pick a good introductory text and just work through the exercises.

1 comments

You can turn an interpreter into a compiler by replacing all code that actually does something by code that prints out the code that does it in the target language. It takes a bit to wrap your head around it, and you won't get an optimizing compiler, but a compiler it will be.

So your values are no longer values in the interpreter's language, but descriptions in the target language for getting that value. To compile an expression, you first handle the sub-expressions, as in an interpreter, which prints the code to compute them. Additionally, you get such a value description for the return value of each expression. Then you can use those descriptions to print out code to get them into known locations (e.g. registers). Then you can print out code to perform your operation on the values in those locations and put the result in another location (e.g. on the stack). The return value of this compilation step is the description of that location.

I like your comment so please don't interpret my remark as evidence of the contrary. I simply think you misunderstood me.

I didn't mean to say that it's hard to transition from interpreters to compilers, only that I found it hard to transition from interpreting/compiling an arithmetic language to a full structured programming language with general recursion, loops, data structures, variables and so on.

Since I'm not sure how understandable the explanation was, here is an example, a simple interpreter for a Lisp-like language, using Python lists as the syntax tree.

    def interpret(state, function_table, expression):
        function = function_table[expression[0]]
        return function(state, function_table, *expression[1:])
    
    def interpret_all(state, function_table, expressions):
        return [interpret(state, function_table, e) for e in expressions]
    
    interpreter_table = {
        '+': lambda s, ft, *es: reduce(lambda a, b: a + b, interpret_all(s, ft, es)), # sum
        '*': lambda s, ft, *es: reduce(lambda a, b: a * b, interpret_all(s, ft, es)), # product
        '!': lambda s, ft, k, v: s.update({k: interpret(s, ft, v)}),                  # set variable
        '?': lambda s, ft, k: s[k],                                                   # get variable
        ';': lambda s, ft, *es: interpret_all(s, ft, es)[-1],                         # execute a sequence and get the last value
    }
An example program for 'x = a * (b + a), return x * x' looks like this:

    program = [';',
                ['!', 'x', ['*', ['?', 'a'], ['+', ['?', 'b'], ['?', 'a']]]],
                ['*', ['?', 'x'], ['?', 'x']]]
You can execute the interpreter with initial values for a and b like this:

    >>> interpret({'a': 10, 'b': 20}, interpreter_table, program)
    90000
The compiler can be implemented by swapping out the interpreter_table with different functions to print code instead of executing it:

    def fresh_variable(state):
        next_tmp = state.get('__next_tmp__', 0)
        var = 'tmp_%s' % next_tmp
        state['__next_tmp__'] = next_tmp+1
        return var
    
    def compile_sum(state, a, b):
        var = fresh_variable(state)
        print(var+' = '+a+' + '+b)
        return var
    
    def compile_product(state, a, b):
        var = fresh_variable(state)
        print(var+' = '+a+' * '+b)
        return var
    
    def compile_set(state, key, value):
        state[key] = key
        print(key+' = '+value)
    
    compiler_table = {
        '+': lambda s, ft, *es: reduce(lambda a, b: compile_sum    (s, a, b), interpret_all(s, ft, es)), # sum
        '*': lambda s, ft, *es: reduce(lambda a, b: compile_product(s, a, b), interpret_all(s, ft, es)), # product
        '!': lambda s, ft, k, v: compile_set(s, k, interpret(s, ft, v)),                                 # set variable
        '?': lambda s, ft, k: s[k],                                                                      # get variable
        ';': lambda s, ft, *es: interpret_all(s, ft, es)[-1],                                            # execute a sequence and get the last value
    }
Then you can easily compile the program:

    >>> interpret({'a': '10', 'b': '20'}, compiler_table, program)
    tmp_0 = 20 + 10
    tmp_1 = 10 * tmp_0
    x = tmp_1
    tmp_2 = x * x
    'tmp_2' # this is the return value, telling you where to find the result
Depending on the syntax of your target language, control flow might require additional compiler state. In Python, I would have to store the current indentation level. If someone is interested, I could show how to do if-statements.
Somewhat like this but taking it further: https://github.com/darius/toot