Hacker News new | ask | show | jobs
by 0x7f800000 2992 days ago
But can the interpreter interpret itself?
5 comments

Does anyone have a history of self-hosting compilers / interpreters? I bet there is more to flesh out here than what Wikipedia has to offer:

https://en.wikipedia.org/wiki/Bootstrapping_(compilers)#Hist...

I think so? It's been a long time, and I don't remember exactly what I did while playing around with it. At https://github.com/darius/tailbiter I ported it to Python 3.4 and stripped it down to be in a subset of Python accepted by my self-hosting Python-to-bytecode compiler so that the compiler plus the interpreter could reproduce itself. There's a companion article linked there.
This was my question as well. I believe the scope of this project is to interpret Python bytecode using Python.
Obligatory exec() mention.
It isn't a LISP :P
There is, however, http://www.aosabook.org/en/pypy.html , which is indeed self-hosting.
You don't need a language to be a LISP in order for them to be able to interpret themselves. There are several Java interpreters which can interpret themselves, for example.
You just need it to be a Lisp in order to do it in such a way that you get confused whether you're in the interpreted language or the host one.
Byterun confuses host and interpreter in plenty of places.

It uses the host Python for attribute resolution (all code in getters and setters runs on the host), exception handling (which combines with the previous point to make some NameErrors impossible to catch) and even function calls (but all functions are wrapped so that their _call__ switches back to byterun as the interpreter).

Because of that confusion, byterun isn't very useful if you want to be really independent from the host interpreter; it's just too easy to escape from the VM. As a learning exercise however, it is helpful for understanding Python's innards at a level higher than C.

I mean actually getting confused.
Do they interpret java, the programming language, or java bytecode? Not the same thing. lisp being homoiconic, it's much easier to implement a metaciruclar evaluator.

https://en.wikipedia.org/wiki/Homoiconicity

If I gave you a function that accepted Java source code as text and interpreted it, how would you detect that I was first converting it to Java bytecode and then interpreting that bytecode?

You couldn't, so it's an irrelevant internal detail. The function interprets Java source code.

> If I gave you a function that accepted Java source code as text and interpreted it, how would you detect that I was first converting it to Java bytecode and then interpreting that bytecode?

> You couldn't, so it's an irrelevant internal detail. The function interprets Java source code.

In a Lisp interpreter the actual source code is Lisp data, not text. One can give the function access to its source code, allow it to inspect it or even to change it - while the interpreter is executing this source code.

In a Lisp interpreter, the code can even modify the source code (we are not talking about the byte-code or machine code) while it is running.

    CL-USER 20 > (let ((code (copy-tree '(+ 1 2 bar))))
                   `(defun foo (bar)
                      (print ,code)
                      (unless (eq (first ',code) '-)
                        (setf (first ',code) '-)
                        (foo bar))
                      (values)))
    (DEFUN FOO (BAR) (PRINT (+ 1 2 BAR)) (UNLESS (EQ (FIRST (QUOTE (+ 1 2 BAR))) (QUOTE -)) (SETF (FIRST (QUOTE (+ 1 2 BAR))) (QUOTE -)) (FOO BAR)) (VALUES))

    CL-USER 21 > (eval *)
    FOO

    CL-USER 22 > (foo 41)

    44 
    -42 
As you can see the function modified itself so that the operator + was replaced with a - and then called itself again with the same argument.

If we now look at the function, we can see that it was indeed changing itself:

    CL-USER 23 > (pprint (function-lambda-expression #'foo))

    (LAMBDA (BAR)
      (DECLARE (SYSTEM::SOURCE-LEVEL #<EQ Hash Table{0} 41B04012D3>))
      (DECLARE (LAMBDA-NAME FOO))
      (PRINT (- 1 2 BAR))
      (UNLESS (EQ (FIRST '(- 1 2 BAR)) '-) (SETF (FIRST '(- 1 2 BAR)) '-) (FOO BAR))
      (VALUES))
The PRINTED expression value is now computed with the - operator.

The interpreted execution itself allows also different things then the compiled execution. For example if we have a break point, we can see the actual source code currently executed and we can change it while in the break point and then resume execution. Since the interpreter runs the source code, we can also can modify the interpreter to do something else with the source code, while it is executing it - like recording it or tracing it or stepping it.

Yeah I know about how Lisp works - I have a PhD in metaprogramming using interpreters and meta-circular compilers. I just don't agree it's really materially different.
I agree with that.

A compiler takes code to transform it into another medium: binary executable, code (transpiler), ...

A interpreter takes code to evaluate it.

All the self proclaimed meta circular or "self-hosted" implementations of java that i have seen actually require an external compiler (eg: javac from the JDK). They are in reality implementations of the java virtual machine, not the language; You can't feed java code to them to execute. they are not even self-hosted.

Not sure, but I think I read recently (maybe in the Rebol docs) that it is also a homoiconic language. And I think I also read that Rebol programs can interpret chunks of Rebol code passed to them.
Doesn't need to be a Lisp. All kinds of interpreters can interpret themselves if they're written in the same language -- similar to bootstraping a compiler.
It was a geek joke, I'm not serious. I've written plenty of both languages, even in .gov.

Why so serious?!