What I've noticed is that JITs often can reach C speed in workloads like this:
sum = 0
for i in xrange(n):
for j in xrange(i):
sum += A[i][j]
And when they reach that milestone, some people call it "as fast as C".
Never mind that that's not what people actually write in Python or PHP. It's a synthetic benchmark, not a real workload.
The workloads in those languages are generally oriented around strings, hash tables, and function/method calls.
And the JITs don't seem to do nearly as good a job there. I tested PyPy on Oil [1] a few years ago, and it made it slower, not faster. And it used more memory. (Though PyPy is an amazing project in many respects.)
This is not what people write in Python or PHP, but this is what people write in C extensions for Python or PHP. Having your JIT be that fast allows you to forego those extensions and write the low-level hot loops in the same language, and that's a huge improvement.
You usually don't care how your matrix multiplication/regex matching/unicode normalization/JSON parsing is implemented, but people had to make those, and they are users of the language too.
Even though it might not change the bottom-line for your high-level app.
Well, the problem is that Python and PHP are actually bad languages for expressing code like that. For expressing C. For one, they're not statically typed.
Julia is a dynamic language that seems to do better because it was designed for this purpose.
But it doesn't seem to have panned out in practice in Python, or PHP as far as I know. Those languages have huge piles of C, and whenever you call into C, the JIT gets confused. People don't seem to rewrite their huge piles of C in Python or PHP. In Python, it's more likely Cython.
I'd like to see pointers to counterexamples -- where people actually wrote some C-like code in Python or PHP and let the JIT do its work. I haven't seen it, aside from the PyPy project itself, and maybe a few other examples. I think you would still take a significant performance hit.
The issue is that C compilers in 2020 are even better at compiling the example I showed. They do amazing things with that kind of code that state-of-the-art JITs don't in practice.
It’s not that the JIT gets confused, it’s that the C APIs for these languages can do almost anything - even stuff that you can’t normally do in the language. So you are faced with a giant optimization boundary.
However a call to a shared library that isn’t linked against your language API is not very expensive as you have a much better handle on the values that are escaping and can make much better optimization choices.
In the Truffle project we are using an LLVM bitcode interpreter that allows us to JIT right through that language boundary and still link to native shared libraries. This means people shouldn’t have to rewrite their C extensions and we can hopefully still run the combination of high level language and C extension faster.
That optimization boundary seems like it's much more of a problem for TruffleRuby than it is for language-specific native implementations? IIRC TruffleRuby relies a lot on being able to optimize away Ruby objects and frames and there's quite a performance cliff if you have to materialize full escaping objects?
JSC and LuaJIT have simpler ways to deal with calling native code which might do weird stuff.
>Well, the problem is that Python and PHP are actually bad languages for expressing code like that. For expressing C. For one, they're not statically typed.
The Psyco project (now dead) used to get very reasonable speedups (factors of several) in pure Python code, particularly for numeric algorithms. It was retired because PyPy was being developed and was expected to solve all speed problems. I wonder why this approach worked while other python JITs did not.
This is sort of contradictory but at the same time as expected speed up on "realistic" code isn't as big it's also easier for JIT compiling VMs to optimize high level abstractions than something like C.
JIT compilers have different optimizations available like:
Fast inlined heap allocation (normally much faster than malloc/free). V8 even does allocation combining
Transparent ropes for strings
High level alias analysis for hash tables and objects
Inlining dynamic dispatched functions and dynamically loaded functions
I think a workload like this is more common in the PHP world. Not saying that others don’t exist, but handling routing, queries, cached content is very different from simply doing mathematical/memory intensive applications.
You could do that on VPS or your machine but heavyweight plugin like Woocommerce can incurred performance and memory issues that they can do little to improve if you’re referring to benchmark with CMS.
The irony of "as fast as C" comparison, is that anyone doing 8 and 16 bit coding on home micros remembers how lousy C compilers used to be (like most other high level languages), to the point that any junior Assembly developer could easily write much better code.
Any language can eventually reach that point with enough money, time, and in C's case doing 200+ optimizations with unexpected results.
No implementation of a language as dynamic as something like PHP has ever managed it in practice. TruffleRuby uses Java for particularly performance sensitive parts. JSC relies on calling C++ or "intrinsics" which are hand-written IR snippets of code to JIT.
Well, most of PHP is calling into C functions (all the standard libs), and as for the rest a JIT can absolutely be as fast (or faster, due to micro-optimizations, profiling, knowledge of non-aliasing, etc) than C.
Never mind that that's not what people actually write in Python or PHP. It's a synthetic benchmark, not a real workload.
The workloads in those languages are generally oriented around strings, hash tables, and function/method calls.
And the JITs don't seem to do nearly as good a job there. I tested PyPy on Oil [1] a few years ago, and it made it slower, not faster. And it used more memory. (Though PyPy is an amazing project in many respects.)
[1] https://www.oilshell.org