Does anyone know why for example the Ruby team is able to create JITs that are performant with comparative ease to Python? They are in many ways similar languages, but Python has 10x the developers at this point.
Ruby in both its semantics and implementation is very close to smalltalk and does not really use the Python's object model that can be summarized as "everything is a dict with string keys". That makes all the tricks discovered over last 40 years of how to make Smalltalk and Lisp fast much more directly applicable in Ruby.
It's easy to dismiss our efforts, but Ruby is just as dynamic if not more than Python. It's also a very difficult language to optimize. I think we could have done the same for Python. In fact the Python JIT people reached out to me when they were starting this project. They probably felt encouraged seeing our success. However they decided to ignore my advice and go with their own unproven approach.
This is probably going to be an unpopular take but building a good JIT compiler is hard and leadership matters. I started the YJIT project with 10+ years of JIT compiler experience and a team of skilled engineers, whereas AFAIK the Python JIT project was lead by a student. It was an uphill battle getting YJIT to work well at first. We needed grit and I pushed for a very data-driven approach so we could learn from our early failures and make informed decisions. Make of that what you will.
Yes Python is hard to optimize. I Still believe that a good JIT for CPython is very possible but it needs to be done right. Hire me if you want that done :)
> whereas AFAIK the Python JIT project was lead by a student.
I am definitely not leading the team! I am frankly unqualified to do so lol. The team is mostly led by Mark Shannon, who has 10+ years of compiler/static analysis experience as well. The only thing I initially led was the optimizer implementation for the JIT. The overall design to choose tracing, to use copy and patch, etc. were other people.
> However they decided to ignore my advice and go with their own unproven approach.
Your advice was very much appreciated and I definitely didn't ignore your advice. I just don't have much say over the initial architectural choices we make. We're slowly changing the JIT based on data, but it is an uphill battle like you said. If you're interested, it's slowly becoming more like lazy basic block versioning https://github.com/python/cpython/issues/128939
You did great work on YJIT, and I am quite thankful for that.
Thanks for the response Maxime, your work on YJIT is astounding. The speedup from YJIT was a huge improvement over cRuby or MJIT, and the work was done relatively quickly compared to Python which seems to always be talking about this JIT but we are never seeing a comparable release.
Having had no experience in JIT development but having followed the faster cpython JIT progress on a weekly basis, I do find their JIT strategy a bit weird. The entire decision seemed to revolve around not wanting to embebed an external JIT/compiler with all that entails...
At first I thought their solution was really elegant. I have an appreciation for their approach, and I could have been captivated myself to choose it. But at this point I think this is a sunk cost fallacy. The JIT is not close to providing significant improvements and no one in the faster cpython community seems to be able to call the shot that the foundational approach may not be able to give optimal results.
I either hope to be wrong or hope that faster cpython managment has a better vision for the JIT than I do.
Smalltalk is highly dynamic, it keeps surprising me Python gets put into some kind of special place as excuse why people keep failing at JIT adoption.
Everything is a message, the meta classes that define object shapes can change any time some feels like it, there are methods like becomes: that completely replaces an object across all its references on the running image, break into debugger and redo after whatever was changed while into the debugger, code loaded over network,....
> Python's object model that can be summarized as "everything is a dict with string keys".
Given this "it's dicts all the way down" nature of CPython, I'm curious if the recent hash table theoretical breakthrough[1] discussed here[2] a few months ago may eventually help making it much faster, given the compounding of dict upon dict?
My complete _guess_ (in which I make a bunch of assumptions!) is that generally it seems like the Ruby team has been more willing to make small breaking changes, whereas it seems a lot like the Python folks have become timid in those regards after the decade of transition from 2 -> 3.
> Python has made many breaking changes after 2->3 as well.
Aside from the `async` keyword (experience with which seems like it may have driven the design of "soft keywords" for `match` etc.), what do you have in mind that's a language feature as opposed to a standard library deprecation or removal?
Yes, the bytecode changes with every minor version, but that's part of their attempts to improve performance, not a hindrance.
Why do you exclude the standard library like it's a small thing? If it's not part of the language, why do they host the documentation on the same website and ship it with the same package?
In C, dotnet, Rust or even Javascript, stdlib breakages are basically the same as language breakages. Python is an outlier for this.
Minor breaking changes in the standard-lib are normal in python. They have always been there, and are usually communicated over a long timeframe, so people have enough time to prepare. The point is, they are usually won't affect many people, unlike major breaking changes in the language itself.
The standard library is maintained and delivered by the CPython-team itself, but at the end of the day it's just a better maintained 3rd-party-collection you have to trust over your own code, and it's not much different in that regard than any other 3rd-party python-code on pip.
> In C, dotnet, Rust or even Javascript, stdlib breakages are basically the same as language breakages. Python is an outlier for this.
That might be because C and Javascript have no serious standard-lib. For python it's more comparable to the builtins, what they deliver out of the box.
Smalltalk, Self, Lisp, are highly dynamic, their JIT research are the genesis of modern JIT engines.
For some strange reason, Python community rather learns C, calls it "Python", instead of focusing why languages that are just as dynamic, have managed already a few decades ago.
Python has a longtime connection to C and somehow I think Python is a dynamic language in the C tradition, not of any other community, maybe the same thing your are hinting.
Hard to put a finger on what exactly, but Python has never been so interested in purity, rather in pragmatic functionality and it ends up in a place where it gives access to C style idioms and API, see for example the os module.
Maybe it is that Python doesn't have it's own model of the world, but it provides a dynamic language facade to the C model of the world.
The funding is one angle, but the Shopify Ruby team isn't that big (<10 people iirc). Python is used extensively at just about every tech company, and Meta, Apple, Microsoft, Alphabet, and Amazon each have at least 10x as many engineers as Shopify. This makes me think that there must be some kind of language/ecosystem reason that makes Python much harder than Ruby to optimize.
I may not be completely accurate on this because there's not a whole lot of information on how Python is doing their thing so...
The way (I believe) Python is doing it is to take code templates and stitching them together (copy & patch compilation) to create an executable chunk of code. If, for example, one were to take the py-bytecode and just stitch all the code chunks together all you can realistically expect to save is the instruction dispatch operations, which the compiler should make really fast anyway, which leaves you at parity with the interpreter since each code chunk is inherently independent so the compiler can't do its magic on the entire code chunk. Basically this is just inlining the bytecode operations.
To make a JIT compiler really excel you'd need to do something like take all the individual operations of each individual opcode and lower that to an IR and then optimize over the entire method using all the bells and whistles of modern compilers. As you can imagine this is a lot more work than 'hacking' the compiler into producing code fragments which can be patched together. Modern compilers are really good at these sorts of things and people have been trying to make the Python interpreter loop as efficient as possible for a long time so there's a big hurdle to overcome here.
I've (or more accurately, Claude) has been writing a bytecode VM and the dispatch loop is basically just a pointer dereference and a function call which is about as fast as you can get. Ok, theoretically, this is how it works as there's also a check to make sure the opcode is within range as the compiler part is still being worked on and it's good for debugging but foundationally this is how it works.
From what I've gleaned from the literature the real key to making something like copy & patch work is super-instructions. You take common patterns, like MULT+ADD, and mash them together so the C compiler can do its magic. This was maybe mentioned in the copy & patch paper or, perhaps, they only talked about specialization based on types, don't actually remember.
So, yeah, if you were just competing against a basic tree-walking interpreter then copy & patch would blow it out of the water but C compilers and the Python interpreter have both had million of people hours put into them so that's really tough competition.