Hacker News new | ask | show | jobs
Deep Dive into PHP 8's JIT (thephp.website)
100 points by nawarian 2114 days ago
10 comments

This article is actually now quite out of date as PHP has switched to a tracing JIT which appears to be heavily based on LuaJIT.

https://github.com/php/php-src/pull/5874

https://github.com/php/php-src/commit/4bf2d09edeb14467ba7955...

Any ideas why they’ve moved from a method-based JIT to tracing?

AFAIK tracing JITs are generally inferior to method-based ones, which is why none of the major JavaScript engines use tracing. Their only advantage seems to be (relative) simplicity, which is essential for the lightweight LuaJIT but not for PHP.

The method JIT they built before failed to provide any improvements on a real world app. Building a JIT compiler for a dynamic language that's actually faster than a fast interpreter is quite tricky!

There's a lot more to it than just compiling what the interpreter does. CRuby's JIT has a different approach using C templating rather than Dynasm and templating/context threading but it also fails to improve performance much for the same reasons.

Tracing JITs are really good at optimizing a small scope. What a tracing JIT does with recording a single path of linear control flow the method JITs also try to do with branch profiling and pruning.

It works really well for a particular style of Lua. For a language like Ruby or PHP it's a lot harder to make a tracing JIT work well on existing code.

The main problem is tail duplication causes the number of traces to increase exponentially with each branch. You have to either have trace heuristics which keep the traces very short or go half way to a method JIT and build a control flow graph.

I got to the tail explosion problem building a tracing JIT for CRuby and gave up.

That doesn't sound right. How would a trace-based JIT be easier to implement? There was a research project which made HotSpot into a trace-based JIT. It reported both faster compilation times, and superior quality of generated code.

(I don't know if the HotSpot folks ever considered adopting the changes.)

http://www.ssw.jku.at/Research/Papers/Haeubl11/

> How would a trace-based JIT be easier to implement?

When compiling at method-granularity, with incomplete information about types, complex features are required e.g. On-Stack Replacement (if the current method becomes "hot", it needs to be replaced by a compiled version), Inline Caches, Hidden Classes and Deoptimization.

In a trace-based JIT, all these features can be replaced by "just compile another trace".

We need all those complex features with tracing JITs too. Getting into a trace is basically OSR. Exiting a trace and restoring the interpreter state is basically an OSR exit or deoptimization. It's really just different terminology for the same thing. Inline caches on method calls become guards. We can't just ignore them.

LuaJIT does without hidden classes or property caching because on-trace it's able to eliminate table access and allocation. Performance tanks on OO Lua once you fall off trace. You can work around it with "compile another trace" in a sense that works as long as you have infinite cache.

Tiny nitpick: the article describes JavaScript as an interpreted language without JIT when most major JS engines do have JIT optimisation. It’s difficult to make this call about any language with more than one implementation.

But anyway, kudos to the PHP folks for yet another improvement. It’s been a long time since I used it but I’m continually impressed by how far it’s come. And it’s a useful test to how flexible a developer is willing to be to ask them to code some PHP, there are fewer and fewer reasons against it these days other than personal style preference.

The whole idea of "language X is interpreted/compiled" is a simplification I wish people were more careful with.

Cpython is interpreted, pypy has JIT, mypyc is compiled. It's the implementation, or even implementation's specific runtime options that decide about things like that, not the language.

The statement would be far more accurate as "language X is [almost always/sometimes/often/commonly/etc.] interpreted/compiled". Unfortunately a lot of people seem to like speaking in absolutes.

C can be interpreted too: https://en.wikipedia.org/wiki/CINT

Theoretically, any language can be interpreted.

Compilation is just interpreting what a program does and producing machine code (or other code in the case of transpilers) that computes the same thing.

Interpreters are just that, but instead of producing code, they run code in themselves thats computes the results directly.

One could even consider machine code just “obfuscated” assembly code. In that sense, machine code is just another language that also could be interpreted (intepretation-based emulators like Bochs) or compiled again (JIT-based emulators like DOSBox).

This brings up a slightly related question: is there a language that can’t be compiled?

Of course not, just compile your interpreter but instead of reading input from a file, have it read input from a fixed string (the source code) that is embedded in the binary. You've just created a super shitty compiler!
I personally wouldn’t consider that a compiler but just an interpreter bundled with the source. For example: Electron based desktop programs aren’t considered compiled (by anyone I know) even though they pack the interpreter (Chromium) with the source.
> This brings up a slightly related question: is there a language that can’t be compiled?

Not sure if you're asking this platonically, but Futamura shows us that if you can built an interpreter then you can always transform that automatically to be a compiler.

This is used in practice by some compilers today - they automatically produce a compiler from an interpreter!

"question: is there a language that can’t be compiled?"

There is some question whether Perl could be, because parsing it without running it has some ambiguity. https://www.perlmonks.org/?node_id=663393

That's not really a barrier. It's "can't be parsed without ambiguity", not "can't be parsed". You compile cases like that by compiling a check for the relevant condition, then compile the possible versions in the if/else branches. Unless you can prove which case it will be from other code - then you can simplify anyway.
By wielding the Futamura projections [0][1][2], any interpreter may be turned into a compiler. Perl is handled as one case. Crucially, the resulting compiler need not be fast; if the interpreter is slow, then the compiler will be slow too (this is a special case of the central meme from [3], "if the N'th Futamura projection has quality Q, then the N+1'th Futamura projection will also have quality Q.")

[0] https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...

[1] http://blog.sigfpe.com/2009/05/three-projections-of-doctor-f...

[2] https://www.gwern.net/docs/cs/2009-gluck.pdf

[3] https://www.itu.dk/people/sestoft/pebook/

>Unfortunately a lot of people seem to like speaking in absolutes.

People like speaking in the general case. It saves time from enumeating any inconsequential / statistically not relevant exception.

The real problem is people misunderstanding casual discussion absolutes (which mean "for the large majority/for the ones people care about") with mathematical absolutes (i.e. "X is Y for each and every X")

And the Rust compiler ships with a Rust interpreter, used to evaluate const fns (functions whose output can be determined at compile time).
const fns can only use a subset of the language AIUI
Well, the idea being that CPython is 99% of what people use, and the other are nearly irrelevant projects...
I should probably add this kind of thing to the article then. It is nearly impossible to state "Language X is compiled/interpreted/jitted".

I didn't explicitly say that JS doesn't come with JIT, because I'd be putting Node.JS and every browser engine in the same bag. I simply can't be sure about all of them.

I'll add very soon this caveat that the dialect (js, php, python) doesn't really matter and make it clear that I'm talking about engines, possibly directly point to them.

I hope you understand, though, that I wanted to make JIT as a concept understandable. The language comparisons are secondary.

Thanks a lot for your feedback!

PHP's story really is something to remember as to show no matter how low you start, in a fertile soil, you can grow long and far.
"PHP as fast as C" - People who are in php language development may die laughing reading this statement.

Truth can be harsh, but people sometime overvalue to such an extent is hard to understand.

What I've noticed is that JITs often can reach C speed in workloads like this:

    sum = 0
    for i in xrange(n):
      for j in xrange(i):
        sum += A[i][j]
And when they reach that milestone, some people call it "as fast as C".

Never mind that that's not what people actually write in Python or PHP. It's a synthetic benchmark, not a real workload.

The workloads in those languages are generally oriented around strings, hash tables, and function/method calls.

And the JITs don't seem to do nearly as good a job there. I tested PyPy on Oil [1] a few years ago, and it made it slower, not faster. And it used more memory. (Though PyPy is an amazing project in many respects.)

[1] https://www.oilshell.org

This is not what people write in Python or PHP, but this is what people write in C extensions for Python or PHP. Having your JIT be that fast allows you to forego those extensions and write the low-level hot loops in the same language, and that's a huge improvement.

You usually don't care how your matrix multiplication/regex matching/unicode normalization/JSON parsing is implemented, but people had to make those, and they are users of the language too.

Even though it might not change the bottom-line for your high-level app.

Well, the problem is that Python and PHP are actually bad languages for expressing code like that. For expressing C. For one, they're not statically typed.

Julia is a dynamic language that seems to do better because it was designed for this purpose.

But it doesn't seem to have panned out in practice in Python, or PHP as far as I know. Those languages have huge piles of C, and whenever you call into C, the JIT gets confused. People don't seem to rewrite their huge piles of C in Python or PHP. In Python, it's more likely Cython.

I'd like to see pointers to counterexamples -- where people actually wrote some C-like code in Python or PHP and let the JIT do its work. I haven't seen it, aside from the PyPy project itself, and maybe a few other examples. I think you would still take a significant performance hit.

The issue is that C compilers in 2020 are even better at compiling the example I showed. They do amazing things with that kind of code that state-of-the-art JITs don't in practice.

It’s not that the JIT gets confused, it’s that the C APIs for these languages can do almost anything - even stuff that you can’t normally do in the language. So you are faced with a giant optimization boundary.

However a call to a shared library that isn’t linked against your language API is not very expensive as you have a much better handle on the values that are escaping and can make much better optimization choices.

In the Truffle project we are using an LLVM bitcode interpreter that allows us to JIT right through that language boundary and still link to native shared libraries. This means people shouldn’t have to rewrite their C extensions and we can hopefully still run the combination of high level language and C extension faster.

That optimization boundary seems like it's much more of a problem for TruffleRuby than it is for language-specific native implementations? IIRC TruffleRuby relies a lot on being able to optimize away Ruby objects and frames and there's quite a performance cliff if you have to materialize full escaping objects?

JSC and LuaJIT have simpler ways to deal with calling native code which might do weird stuff.

>Well, the problem is that Python and PHP are actually bad languages for expressing code like that. For expressing C. For one, they're not statically typed.

Well, C hardly is, either...

The Psyco project (now dead) used to get very reasonable speedups (factors of several) in pure Python code, particularly for numeric algorithms. It was retired because PyPy was being developed and was expected to solve all speed problems. I wonder why this approach worked while other python JITs did not.
C is frequently 100x faster than Python for code like the example I showed. With autovectorization and other optimizations it can be 500x.

So if a Python JIT does 10-50x better than CPython on a numeric workload, that sounds impressive, but it's still slow compared to C.

And again they don't get 10-50x on string/hash/method call workloads. I think they're lucky to get 2x in some of those cases.

This is sort of contradictory but at the same time as expected speed up on "realistic" code isn't as big it's also easier for JIT compiling VMs to optimize high level abstractions than something like C.

JIT compilers have different optimizations available like:

Fast inlined heap allocation (normally much faster than malloc/free). V8 even does allocation combining

Transparent ropes for strings

High level alias analysis for hash tables and objects

Inlining dynamic dispatched functions and dynamically loaded functions

Do you have pointers on those? I'd be interested in the rope optimizations, etc.
What I want to see a benchmark like this one, https://kinsta.com/blog/php-benchmarks/

I think a workload like this is more common in the PHP world. Not saying that others don’t exist, but handling routing, queries, cached content is very different from simply doing mathematical/memory intensive applications.

You could do that on VPS or your machine but heavyweight plugin like Woocommerce can incurred performance and memory issues that they can do little to improve if you’re referring to benchmark with CMS.
The irony of "as fast as C" comparison, is that anyone doing 8 and 16 bit coding on home micros remembers how lousy C compilers used to be (like most other high level languages), to the point that any junior Assembly developer could easily write much better code.

Any language can eventually reach that point with enough money, time, and in C's case doing 200+ optimizations with unexpected results.

Well, most of php is just calling C functions. So if they add a JIT to that, it could be a believable claim. Why not?
No implementation of a language as dynamic as something like PHP has ever managed it in practice. TruffleRuby uses Java for particularly performance sensitive parts. JSC relies on calling C++ or "intrinsics" which are hand-written IR snippets of code to JIT.
Well, most of PHP is calling into C functions (all the standard libs), and as for the rest a JIT can absolutely be as fast (or faster, due to micro-optimizations, profiling, knowledge of non-aliasing, etc) than C.

So there's nothing really funny about it....

Unless we are speaking about Java before Java 1.2, it is definitely not interpreted, there are plenty of JIT and AOT implementations without any kind of interpretation step.

Since 25 years, time to learn that implementations and languages are not the same.

Good point, but are there really "plenty" of Java implementations that lack an interpreter? There's Graal and Excelsior JET for AOT. Any others? Which implementations do JIT without any interpretation?
I might be wrong here, as I'm not so close to Java development. But a language implementing JIT, at least to me, is interpreted.

Could you please point an implementation detail where a JIT-capable engine doesn't include interpretation in its runtime?

In every case, thanks a lot for your feedback!

Well, from CS compiler theory point of view it is not.

For example in .NET, MSIL goes directly into a pipeline that produces native. You can easily validate that RyuJIT has no interpretation.

Or for example, watchOS applications packaged with bitcode, get JIT compiled at installation time.

V8 had only baseline compilation until much later https://v8.dev/blog/ignition-interpreter

It's not super exotic to do this.

There is a small error in the text. When describing the opcache.jit setting it shows examples where the third flag is set to 5, eg 1255, but the table of possible values for the third flag, 'T - JIT trigger', goes from 0 to 4.
Thanks a lot! I blindly fetched this from the reference mentioned there. I'll update this as soon as I have time (probably during this week) and also let the author of the referenced post know about it.
Didn't Facebook also develop is JIT for their PHP? Or have they moved on frmo PHP?
Hack/HHVM no longer has a goal to be PHP compatible, but it looks pretty much the same. It does have a JIT: https://hhvm.com/blog/2027/faster-and-cheaper-the-evolution-...
Do we know anyone outside of Facebook uses Hack/HHVM in production?
Slack is the only other company you might have heard of (but still, Slack).
Didn't know that. Thanks for the info!
May be everyone on HN has Adblock/ PiHole on.

But this Blog, a single page has 5 Google Ads in it. I dont mind one or two, top and bottom. But 5, right in the middle of every section.

I am out of advertising for a long time so I might not be aware but back in 2009s-10s, Adsense had a limit for 3 ads per page.

Is that not a thing anymore? Genuinely curious

Edit: Yes, some initial searches reveal that it's not a thing anymore.

Blogs like these are the main reason to use ad blocks. And from their perspective: everyone using adblocks is the reason they need 5 ads on the site
Hey man, author here. Thanks for the feedback. I recently added adsense there and I'm testing with the amount of ads in the page. Currently I'm letting AdSense decide how many ads it should place and where.

But if you believe the amount is so harmful for your experience, don't worry. I'll be more than glad to reduce this amount.

Cheers!

Thank you all for sharing your comments and thoughts. I'm unfortunately out of time today, but I'll try to address your comments asap :)

Cheers!

Not a deep dive, a rather very shallow dive.

Deep dives were handled before: https://wiki.php.net/rfc/jit And its discussion https://externals.io/message/103903

I definitely misread this title as the P8 JIT that IBM was working on for a while.