Any ideas why they’ve moved from a method-based JIT to tracing?
AFAIK tracing JITs are generally inferior to method-based ones, which is why none of the major JavaScript engines use tracing. Their only advantage seems to be (relative) simplicity, which is essential for the lightweight LuaJIT but not for PHP.
The method JIT they built before failed to provide any improvements on a real world app. Building a JIT compiler for a dynamic language that's actually faster than a fast interpreter is quite tricky!
There's a lot more to it than just compiling what the interpreter does. CRuby's JIT has a different approach using C templating rather than Dynasm and templating/context threading but it also fails to improve performance much for the same reasons.
Tracing JITs are really good at optimizing a small scope. What a tracing JIT does with recording a single path of linear control flow the method JITs also try to do with branch profiling and pruning.
It works really well for a particular style of Lua. For a language like Ruby or PHP it's a lot harder to make a tracing JIT work well on existing code.
The main problem is tail duplication causes the number of traces to increase exponentially with each branch. You have to either have trace heuristics which keep the traces very short or go half way to a method JIT and build a control flow graph.
I got to the tail explosion problem building a tracing JIT for CRuby and gave up.
That doesn't sound right. How would a trace-based JIT be easier to implement? There was a research project which made HotSpot into a trace-based JIT. It reported both faster compilation times, and superior quality of generated code.
(I don't know if the HotSpot folks ever considered adopting the changes.)
> How would a trace-based JIT be easier to implement?
When compiling at method-granularity, with incomplete information about types, complex features are required e.g. On-Stack Replacement (if the current method becomes "hot", it needs to be replaced by a compiled version), Inline Caches, Hidden Classes and Deoptimization.
In a trace-based JIT, all these features can be replaced by "just compile another trace".
We need all those complex features with tracing JITs too. Getting into a trace is basically OSR. Exiting a trace and restoring the interpreter state is basically an OSR exit or deoptimization. It's really just different terminology for the same thing. Inline caches on method calls become guards. We can't just ignore them.
LuaJIT does without hidden classes or property caching because on-trace it's able to eliminate table access and allocation. Performance tanks on OO Lua once you fall off trace. You can work around it with "compile another trace" in a sense that works as long as you have infinite cache.
Tiny nitpick: the article describes JavaScript as an interpreted language without JIT when most major JS engines do have JIT optimisation. It’s difficult to make this call about any language with more than one implementation.
But anyway, kudos to the PHP folks for yet another improvement. It’s been a long time since I used it but I’m continually impressed by how far it’s come. And it’s a useful test to how flexible a developer is willing to be to ask them to code some PHP, there are fewer and fewer reasons against it these days other than personal style preference.
The whole idea of "language X is interpreted/compiled" is a simplification I wish people were more careful with.
Cpython is interpreted, pypy has JIT, mypyc is compiled. It's the implementation, or even implementation's specific runtime options that decide about things like that, not the language.
The statement would be far more accurate as "language X is [almost always/sometimes/often/commonly/etc.] interpreted/compiled". Unfortunately a lot of people seem to like speaking in absolutes.
Compilation is just interpreting what a program does and producing machine code (or other code in the case of transpilers) that computes the same thing.
Interpreters are just that, but instead of producing code, they run code in themselves thats computes the results directly.
One could even consider machine code just “obfuscated” assembly code. In that sense, machine code is just another language that also could be interpreted (intepretation-based emulators like Bochs) or compiled again (JIT-based emulators like DOSBox).
This brings up a slightly related question: is there a language that can’t be compiled?
Of course not, just compile your interpreter but instead of reading input from a file, have it read input from a fixed string (the source code) that is embedded in the binary. You've just created a super shitty compiler!
I personally wouldn’t consider that a compiler but just an interpreter bundled with the source. For example: Electron based desktop programs aren’t considered compiled (by anyone I know) even though they pack the interpreter (Chromium) with the source.
> This brings up a slightly related question: is there a language that can’t be compiled?
Not sure if you're asking this platonically, but Futamura shows us that if you can built an interpreter then you can always transform that automatically to be a compiler.
This is used in practice by some compilers today - they automatically produce a compiler from an interpreter!
That's not really a barrier. It's "can't be parsed without ambiguity", not "can't be parsed". You compile cases like that by compiling a check for the relevant condition, then compile the possible versions in the if/else branches. Unless you can prove which case it will be from other code - then you can simplify anyway.
By wielding the Futamura projections [0][1][2], any interpreter may be turned into a compiler. Perl is handled as one case. Crucially, the resulting compiler need not be fast; if the interpreter is slow, then the compiler will be slow too (this is a special case of the central meme from [3], "if the N'th Futamura projection has quality Q, then the N+1'th Futamura projection will also have quality Q.")
>Unfortunately a lot of people seem to like speaking in absolutes.
People like speaking in the general case. It saves time from enumeating any inconsequential / statistically not relevant exception.
The real problem is people misunderstanding casual discussion absolutes (which mean "for the large majority/for the ones people care about") with mathematical absolutes (i.e. "X is Y for each and every X")
I should probably add this kind of thing to the article then. It is nearly impossible to state "Language X is compiled/interpreted/jitted".
I didn't explicitly say that JS doesn't come with JIT, because I'd be putting Node.JS and every browser engine in the same bag. I simply can't be sure about all of them.
I'll add very soon this caveat that the dialect (js, php, python) doesn't really matter and make it clear that I'm talking about engines, possibly directly point to them.
I hope you understand, though, that I wanted to make JIT as a concept understandable. The language comparisons are secondary.
What I've noticed is that JITs often can reach C speed in workloads like this:
sum = 0
for i in xrange(n):
for j in xrange(i):
sum += A[i][j]
And when they reach that milestone, some people call it "as fast as C".
Never mind that that's not what people actually write in Python or PHP. It's a synthetic benchmark, not a real workload.
The workloads in those languages are generally oriented around strings, hash tables, and function/method calls.
And the JITs don't seem to do nearly as good a job there. I tested PyPy on Oil [1] a few years ago, and it made it slower, not faster. And it used more memory. (Though PyPy is an amazing project in many respects.)
This is not what people write in Python or PHP, but this is what people write in C extensions for Python or PHP. Having your JIT be that fast allows you to forego those extensions and write the low-level hot loops in the same language, and that's a huge improvement.
You usually don't care how your matrix multiplication/regex matching/unicode normalization/JSON parsing is implemented, but people had to make those, and they are users of the language too.
Even though it might not change the bottom-line for your high-level app.
Well, the problem is that Python and PHP are actually bad languages for expressing code like that. For expressing C. For one, they're not statically typed.
Julia is a dynamic language that seems to do better because it was designed for this purpose.
But it doesn't seem to have panned out in practice in Python, or PHP as far as I know. Those languages have huge piles of C, and whenever you call into C, the JIT gets confused. People don't seem to rewrite their huge piles of C in Python or PHP. In Python, it's more likely Cython.
I'd like to see pointers to counterexamples -- where people actually wrote some C-like code in Python or PHP and let the JIT do its work. I haven't seen it, aside from the PyPy project itself, and maybe a few other examples. I think you would still take a significant performance hit.
The issue is that C compilers in 2020 are even better at compiling the example I showed. They do amazing things with that kind of code that state-of-the-art JITs don't in practice.
It’s not that the JIT gets confused, it’s that the C APIs for these languages can do almost anything - even stuff that you can’t normally do in the language. So you are faced with a giant optimization boundary.
However a call to a shared library that isn’t linked against your language API is not very expensive as you have a much better handle on the values that are escaping and can make much better optimization choices.
In the Truffle project we are using an LLVM bitcode interpreter that allows us to JIT right through that language boundary and still link to native shared libraries. This means people shouldn’t have to rewrite their C extensions and we can hopefully still run the combination of high level language and C extension faster.
That optimization boundary seems like it's much more of a problem for TruffleRuby than it is for language-specific native implementations? IIRC TruffleRuby relies a lot on being able to optimize away Ruby objects and frames and there's quite a performance cliff if you have to materialize full escaping objects?
JSC and LuaJIT have simpler ways to deal with calling native code which might do weird stuff.
>Well, the problem is that Python and PHP are actually bad languages for expressing code like that. For expressing C. For one, they're not statically typed.
The Psyco project (now dead) used to get very reasonable speedups (factors of several) in pure Python code, particularly for numeric algorithms. It was retired because PyPy was being developed and was expected to solve all speed problems. I wonder why this approach worked while other python JITs did not.
This is sort of contradictory but at the same time as expected speed up on "realistic" code isn't as big it's also easier for JIT compiling VMs to optimize high level abstractions than something like C.
JIT compilers have different optimizations available like:
Fast inlined heap allocation (normally much faster than malloc/free). V8 even does allocation combining
Transparent ropes for strings
High level alias analysis for hash tables and objects
Inlining dynamic dispatched functions and dynamically loaded functions
I think a workload like this is more common in the PHP world. Not saying that others don’t exist, but handling routing, queries, cached content is very different from simply doing mathematical/memory intensive applications.
You could do that on VPS or your machine but heavyweight plugin like Woocommerce can incurred performance and memory issues that they can do little to improve if you’re referring to benchmark with CMS.
The irony of "as fast as C" comparison, is that anyone doing 8 and 16 bit coding on home micros remembers how lousy C compilers used to be (like most other high level languages), to the point that any junior Assembly developer could easily write much better code.
Any language can eventually reach that point with enough money, time, and in C's case doing 200+ optimizations with unexpected results.
No implementation of a language as dynamic as something like PHP has ever managed it in practice. TruffleRuby uses Java for particularly performance sensitive parts. JSC relies on calling C++ or "intrinsics" which are hand-written IR snippets of code to JIT.
Well, most of PHP is calling into C functions (all the standard libs), and as for the rest a JIT can absolutely be as fast (or faster, due to micro-optimizations, profiling, knowledge of non-aliasing, etc) than C.
Unless we are speaking about Java before Java 1.2, it is definitely not interpreted, there are plenty of JIT and AOT implementations without any kind of interpretation step.
Since 25 years, time to learn that implementations and languages are not the same.
Good point, but are there really "plenty" of Java implementations that lack an interpreter? There's Graal and Excelsior JET for AOT. Any others? Which implementations do JIT without any interpretation?
There is a small error in the text. When describing the opcache.jit setting it shows examples where the third flag is set to 5, eg 1255, but the table of possible values for the third flag, 'T - JIT trigger', goes from 0 to 4.
Thanks a lot! I blindly fetched this from the reference mentioned there. I'll update this as soon as I have time (probably during this week) and also let the author of the referenced post know about it.
Hey man, author here. Thanks for the feedback. I recently added adsense there and I'm testing with the amount of ads in the page. Currently I'm letting AdSense decide how many ads it should place and where.
But if you believe the amount is so harmful for your experience, don't worry. I'll be more than glad to reduce this amount.
https://github.com/php/php-src/pull/5874
https://github.com/php/php-src/commit/4bf2d09edeb14467ba7955...