Hacker News new | ask | show | jobs
by metalliqaz 2953 days ago
I am aware of PyPy but have not used it myself. My understanding of PyPy is that it gains performance improvements mainly through a hotspot JIT compiler. If Cannoli compiles the entire Python program down to machine code (via rust) then how does PyPy "blow it away"?
4 comments

As others have commented, AOT compilation is limited to the information available at compile time. Various features of Python like dynamic typing and object/class mutation (via del) preclude many static analysis techniques. In Cannoli, this meant that the compiler had to also generate code that manages scope at run time. Whenever an identifier was encountered in the compiled code a hashmap would be searched to find the bound value. This overhead becomes expensive, and the thesis covers optimizations that avoid this. PyPy's JIT operates on the PyPy interpreter itself, finding linear lists of operations that are frequently used. It can then compile these operations to bytecode so the next time that trace is encountered it can execute the compiled code. The self-analysis at run time provides information that an AOT compiler just doesn't have.

That being said, I did leave a few suggestions in the "future work" section that talk about writing an AOT compiler for RPython (the version of Python that PyPy's interpreter is written in). This would provide more information at compile time and would be an interesting comparison between a Python interpreter compiled AOT versus a Python interpreter with a JIT (PyPy).

> As others have commented, AOT compilation is limited to the information available at compile time

Does this mean that a AOT compiler could get a lot faster with pep 484 type hints?

Yes :)! So I mention PEP 3107 (https://www.python.org/dev/peps/pep-3107/) in the thesis. This allows type annotations in the function signature. Cannoli leverages both type annotations in function signatures and assignments to output optimized code. Other projects like Pythran also use type annotations.

PEP 3107 does say:

> By itself, Python does not attach any particular meaning or significance to annotations.

However, I think this will change especially as more projects begin to outperform CPython. In the "Results > Object Optimization" section of the thesis paper, I cover using these very type annotations to optimize the code.

The biggest problem with Python annotations right now is that they don't really mean anything. Nothing is really enforced so it is totally valid to have 'x : int = "string"'. The compiler would have to just ignore this annotation since it was provided the wrong data. This could also be difficult to identify if a variable was being used and its type mislabeled. So it's not perfect but I think it's a step in the right direction.

Only if you change the semantics of the language (e.g. raise an exception if you pass in a value of the wrong type). Otherwise, the type hints probably aren't super useful because the language doesn't actually enforce them.

Edit: That said, if the compiler can determine with certainty that the type signature is always obeyed, then yes, you could apply optimizations to remove a lot of the runtime overhead. I imagine this would be rather difficult to do unless you have type annotations for all (or the vast majority) of your code, including third-party libraries.

JIT'ers tend to be aggressive and disregard language semantics. As far as I know, the java hotspot compiler goes ahead and inlines implementations of calls on generic interfaces, as long as a couple of hundred calls to into the same implementation method. For large methods, it might not inline, but it'll remove the conditional jump based on the type with a direct jump.

That's obviously wrong, examples to show that are trivial. But that's why the direct call also contains a trap to check the type to perform de-optimization if the assertion appears to be wrong. But a simple 'assert type byte sequence is a known value' is still faster than a dynamic jump. A lot faster.

> That's obviously wrong, examples to show that are trivial.

Does that actually affect the behavior of the code, though? That seems to me like a private implementation detail, but I don't know much about the HotSpot JIT or the JVM in general.

Instead of an exception you could fall back to a generic version of the function/block, like deoptimization in JITs.
Deoptimization is actually really hard to implement if you have an ahead of time compiler like Cannoli. You need to get all the stuff that is living in machine registers or in the C stack and then convert them back to whatever representation your generic interpreter uses.

I think this is actually one of the things that most get in the way if you want to use traditional AOT compiler technology (like gcc or LLVM) to implement a JIT. In state of the art JIT compilers this part is always a nest of highly complex and nonportable assembly language.

Compiling to machine code is not a panacea for optimization. A optimized JIT compiler is going to blow an AOT compiler out of the water. Being smart about the machine code generated is significantly more important than generating machine code. In particular, PyPy makes several optimizations over python code that a more direct implementation of CPython at the machine level probably wouldn't. For example, PyPy erases dictionary lookups for object member access if the object shape is statically known. Given how prevalent this kind of lookup is in Python code, it's possible that even an interpreter that made this optimization would be faster than a machine code version that used an actual hash table.

I think this compiler also makes this particular optimization, but this is just one of many many optimizations PyPy does. I imagine that with sufficient work, this compiler could be brought up to speed with PyPy, but as it stands right now, PyPy simply benefits from having years of optimization work that a new project doesn't.

For most dynamic languages, the available speedups aren't in simple compilation, but in removing the runtime type checks, method lookups, and other slow operations. This needs the ability to guess at what the code is going to do based on past behavior, and generate specialized versions that get thrown away if the guesses are invalidated.

So, for example, you might see that the last 100 calls to a function were done with integers, so you can generate a variant of the function that only works for integers, and check if it's applicable when you enter the function. If that function stops getting used, you can throw it away.

Doing that well ahead of time requires an extremely good idea of how the program will behave at run time, and even with good information, is still very likely to bloat up your binary hugely. (Facebook used to compile their PHP codebase to a multi-gigabyte binary before moving to HHVM, for example).

Actually I think you answered a question that I already asked about Rust being faster than C. If you don't need to carry out as many checks then I see how that will speed things up.
JITs get to analyze both code and data and optimize for each machine deployed to. A static compiler can only analyze code and the machine used for compilation. If dependencies were pre-compiled, the static compiler won't be able to optimize their relationship with the project. If the machine is changed for deployment.

More information means better optimizations. JITs FTW.

JITs also tend to optimize for compilation time over final performance. Doing ahead-of-time superoptimization or polyhedral loop transformation isn't going to happen in a JIT.
There's no restriction on the kind of optimization a JIT can do. Perhaps there's a current implementation tendency. In contrast, ahead-of-time (data- and platform-ignorant) compilers are restricted.