Hacker News new | ask | show | jobs
by johnjr 693 days ago
I just wrote a post about how the Cpython is much faster without GIL:https://news.ycombinator.com/item?id=40988244
1 comments

I mean, only the threaded version, which is expected. For tons of cases Python without the GIL is not just slower, but significantly slower; "somewhere from 30-50%" according to one of the people working on this: https://news.ycombinator.com/item?id=40949628

All of this is why the GIL wasn't removed 20 years ago. There are real trade-offs here.

30-50% is an understatement. The latest beta is more than 100% slower in a simple benchmark:

https://news.ycombinator.com/item?id=41019626

How is single-threaded code slower without GIL?
Because in the --disable-gil build data structures like ref-counting, dicts, freelists, etc. are locked, even when there is only a single thread.

This is the reason why previous attempts were rejected. But those attempts came from single individuals and not from a photo sharing website.

This matters if --disable-gil becomes the default in the future and is forced on everyone.

That cannot be the reason for a 30-50% slowdown. Uncontested locks are very fast.
They may be fast in C++, but not in the context of CPython. Here are the dirty details. Note that fine-grained locking has also been tried before:

https://dabeaz.blogspot.com/2011/08/inside-look-at-gil-remov...

Thanks for the link, that's an interesting read. Actually the referenced PyMutex is a good old pthread_mutex_t, the same you'd use in C or C++. But I shouldn't have written so surely. Although uncontested locks are very fast, if the loop is tight enough, adding locks will be significant.

However, PEP 703 specifically points out that performance-critical container operations (__getitem__/iteration) avoid locking, so I'm still highly skeptical that those locks are the cause of the 30-50%.

https://peps.python.org/pep-0703/#optimistically-avoiding-lo...

One of the things this spends some time on that was already obsolete in 2011 is using a pool of locks. In 1994 locks are a limited OS resource, Python can't afford to sprinkle millions of them in the codebase. But long before 2011 Linux had the futex, so locks only need to be aligned 32-bit integers. In 2012 Windows gets a similar feature but it can do bytes instead of 32-bit integers if you want.

If a Linux process wants a million locks that's fine, that's just 4MB of RAM now.