Hacker News new | ask | show | jobs
by allanren 680 days ago
It's good to see Python finally able to get rid of GIL. Looking forward to see how much performance can it improve.
2 comments

We already know the upper bound of perf improvement – existing perf * number of cores. It will be worse than that though, as all the GILectomy plans make single-threaded performance worse.

So if you're expecting something better than that, you will be disappointed.

All the GILectomy plans IIRC also include single threaded performance improvements to offset any such costs. So while performance vs GIL is maybe worse for single threaded for the same Python version, performance will still be ahead of where it is today for single-threaded python (assuming everything goes according to plan). That's also why multi-threaded performance will be more than just existing perf * number of cores (vs what it is today, not what removing the GIL alone provides).
But it doesn't offset anything since you get all the other improvements anyway, they're not tied to gil/nogil
I could be misremembering, but I thought that the MSFT team proposed those performance improvements specifically to offset any concerns about single threaded performance degradation from removing the GIL. Thus even if development is happening in parallel by independent (which I thought it wasn't - I thought it was all 1 team doing this work), it was predicated upon nogil being accepted in the first place. Thus if GIL were to remain in Python, then this performance work wouldn't be happening.
Maybe the work wouldn't be happening without the noGIL work, but once it's happened it's not tied to the GIL, you can pick those improvements and continue with a GIL-only Python
This post is literally about step 1: add this behind an unsupported experimental flag to get more insights. Step 2 is mid-term to make it a supported option based on readiness (within another 2 years). Step 3 is making it the default & then removing the GIL [1]. Steps 2 and 3 may not happen if some major unsolvable obstacle appears. But I doubt it's going to be so easy to reverse this direction. Given MSFT is driving all of this right now, it's hard to imagine there's going to be much appetite to break their trust; MSFT is more likely to cut funding before completion which would create some chaos than the steering committee is to violate an agreement around funding (MSFT has made specific long term commitments they're going to keep, but those commitments are only for a few years IIRC).

[1] https://developer.vonage.com/en/blog/removing-pythons-gil-it...

The GIL causes a huge performance hit in data processing/ML by forcing the use of multi-process, which leads to a bunch of unnecessary copying of memory between processes unless you put in a bunch of effort to explicitly share memory. So in some cases the savings will be gigantic, from no longer unnecessarily copying huge dataframes between processes.
But usually, in spaces where you need speed Python is just an orchestrator or glue between pipelines, and actually, calculations are done by db or some c/c++/fortran library.
Yes pandas/numpy calls C++ to do calculations efficiently, but the "glue" can still introduce significant slowdown relative to that when it's copying tens of gigabytes of dataframe unnecessarily between processes. Of course that slow part itself could also be moved to C++, but that's much more effort then just parallel mapping over the dataset in Python with no copying/multiprocessing, as will be possible with no-gil.
Bad code/quick hacks will always be slow (but can be great for prototypes), and sometimes it's worth planning how you're going to process something rather than piling on multiprocessing. Once you reach the point of multigigabyte IPC, it's worth spending the time doing it right.
Building libraries on a GIL-less Python would enable people to access that power without them all building it from scratch themselves.
If the libraries are thread safe can they not release the GIL to avoid copying.

I am pretty sure you are going to say there is a reason this cannot be done, would just like to know what it is!

What libraries? If you're writing some pandas code and want to parallelise some part of your data pipeline, as far as I'm aware Pandas doesn't have much support for that, you need to manually use multiprocessing to process different parts of the dataframe on different threads. Yes there are pandas alternatives that claim to be a drop-in replacement with better parallelism support, but the more pandas features you use, the more likely you are to depend on something they don't support, meaning you need to rewrite some code to switch to them.
But that's such a small fraction of total Python use, that it cannot serve as a validation to make it the default.
It is a fraction of usage that is commercially important to people who fund a lot of Python development.
Aka a power grab for short-term gain.
I would use python much more if every version did not have these many breaking changes, especially with the removal of the GIL. Shame they did not learn from 2 to 3.