Hacker News new | ask | show | jobs
by Banana699 1688 days ago
You're right, PyPy _used_ to base it's wizardry on PE (I don't know if it's Futurama or not, that's honestly the first time I hear of that term), but now they are using something called meta-tracing JIT, where, instead of JIT-tracing the program that your language's source describes, they JIT-trace your interepreter while it's running your language's source.

The extremly cool and awesome thing about this is that this is effectively a general purpsoe JIT, one JIT to rule all interpreted languages that could ever be written. There is nothing specific about Python in the toolchain. For _Any_ interpreted language:

- You write only your naive-but-readable interpreter in Rpython, a restricted subset of python that tries to preserve the readability but ditch the dynamic madness. (This is not python, this is an entirely different language. It just happens that every valid Rpython program is also a valid Python program. There is nothing special about Rpython here either, they could have theoretically picked any readable language to write your naive interpreter in, but they chose Rpython)

- The compilation pipeline produces two things: an exectuable image of your naive interpreter*, and a bytecode image for the general-purpose JITer.

- Normally, it's the executable image of your interpreter that runs your language's programs, but once it detects a user-program-level loop (e.g. because it has encountered a backward jump.), it invokes the supporting runtime (the general-purpose JITer) and delegates to the bytecode version of itself.

- The GP JITer starts tracing the bytecode image of your interpreter (which, remember, is itself executing the user-level program the whole time), once it detects that the user-level loop is done, it says so. Now the general-purpsoe JITer has a record of all the operations that your interpreter executed while it was running the user-level path, which is the same as {all the operations that the user-level path executed} (minus all the interpreter-specific operations, which the GP JITer also knows about because this info is contained in the bytecode)

- The GP JITer treats the execution record as any other JIT, it produces an optimised native version from it, and bingo!, you got yourself a native image of that user-level loop.

- The original interpreter, the executable, now goes back into the picture. It puts that native version of the loop in its pocket, ready for the next time it encounteres the loop.

It's so f*ing cool, that's why their logo is a snake eating itself: there's so much meta shenanigans going on. Their implementation of Python is merely the application, it's the amazing toolchain they built to build it that is the real treasure.

*: One of the steps in creating the exectuable is, I kid you not, is running the standard Cpython interpreter on your Rpython source (as it's valid python), waiting for interpreter to do it's expensive startup, then freezing the whole enviroment it produced to package it with the executable. This couldn't be done to speed up normal Python because its extremly dynamic nature messes with this.*