| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by allanren 680 days ago
	It's good to see Python finally able to get rid of GIL. Looking forward to see how much performance can it improve.

2 comments

fastball 680 days ago

We already know the upper bound of perf improvement – existing perf * number of cores. It will be worse than that though, as all the GILectomy plans make single-threaded performance worse.

So if you're expecting something better than that, you will be disappointed.

link

vlovich123 680 days ago

All the GILectomy plans IIRC also include single threaded performance improvements to offset any such costs. So while performance vs GIL is maybe worse for single threaded for the same Python version, performance will still be ahead of where it is today for single-threaded python (assuming everything goes according to plan). That's also why multi-threaded performance will be more than just existing perf * number of cores (vs what it is today, not what removing the GIL alone provides).

link

eviks 680 days ago

But it doesn't offset anything since you get all the other improvements anyway, they're not tied to gil/nogil

link

vlovich123 680 days ago

I could be misremembering, but I thought that the MSFT team proposed those performance improvements specifically to offset any concerns about single threaded performance degradation from removing the GIL. Thus even if development is happening in parallel by independent (which I thought it wasn't - I thought it was all 1 team doing this work), it was predicated upon nogil being accepted in the first place. Thus if GIL were to remain in Python, then this performance work wouldn't be happening.

link

eviks 680 days ago

Maybe the work wouldn't be happening without the noGIL work, but once it's happened it's not tied to the GIL, you can pick those improvements and continue with a GIL-only Python

link

vlovich123 680 days ago

This post is literally about step 1: add this behind an unsupported experimental flag to get more insights. Step 2 is mid-term to make it a supported option based on readiness (within another 2 years). Step 3 is making it the default & then removing the GIL [1]. Steps 2 and 3 may not happen if some major unsolvable obstacle appears. But I doubt it's going to be so easy to reverse this direction. Given MSFT is driving all of this right now, it's hard to imagine there's going to be much appetite to break their trust; MSFT is more likely to cut funding before completion which would create some chaos than the steering committee is to violate an agreement around funding (MSFT has made specific long term commitments they're going to keep, but those commitments are only for a few years IIRC).

[1] https://developer.vonage.com/en/blog/removing-pythons-gil-it...

link

logicchains 680 days ago

The GIL causes a huge performance hit in data processing/ML by forcing the use of multi-process, which leads to a bunch of unnecessary copying of memory between processes unless you put in a bunch of effort to explicitly share memory. So in some cases the savings will be gigantic, from no longer unnecessarily copying huge dataframes between processes.

link

antupis 680 days ago

But usually, in spaces where you need speed Python is just an orchestrator or glue between pipelines, and actually, calculations are done by db or some c/c++/fortran library.

link

logicchains 680 days ago

Yes pandas/numpy calls C++ to do calculations efficiently, but the "glue" can still introduce significant slowdown relative to that when it's copying tens of gigabytes of dataframe unnecessarily between processes. Of course that slow part itself could also be moved to C++, but that's much more effort then just parallel mapping over the dataset in Python with no copying/multiprocessing, as will be possible with no-gil.

link

aragilar 680 days ago

Bad code/quick hacks will always be slow (but can be great for prototypes), and sometimes it's worth planning how you're going to process something rather than piling on multiprocessing. Once you reach the point of multigigabyte IPC, it's worth spending the time doing it right.

link

robertlagrant 680 days ago

Building libraries on a GIL-less Python would enable people to access that power without them all building it from scratch themselves.

link

graemep 680 days ago

If the libraries are thread safe can they not release the GIL to avoid copying.

I am pretty sure you are going to say there is a reason this cannot be done, would just like to know what it is!

link

logicchains 680 days ago

What libraries? If you're writing some pandas code and want to parallelise some part of your data pipeline, as far as I'm aware Pandas doesn't have much support for that, you need to manually use multiprocessing to process different parts of the dataframe on different threads. Yes there are pandas alternatives that claim to be a drop-in replacement with better parallelism support, but the more pandas features you use, the more likely you are to depend on something they don't support, meaning you need to rewrite some code to switch to them.

link

tgv 680 days ago

But that's such a small fraction of total Python use, that it cannot serve as a validation to make it the default.

link

graemep 680 days ago

It is a fraction of usage that is commercially important to people who fund a lot of Python development.

link

tgv 679 days ago

Aka a power grab for short-term gain.

link

hyperbrainer 679 days ago

I would use python much more if every version did not have these many breaking changes, especially with the removal of the GIL. Shame they did not learn from 2 to 3.

link