Hacker News new | ask | show | jobs
by nullspace 1042 days ago
What a great article from LWN. It was well-worth reading. As someone who was excited about the NoGIL from Sam Gross when it was first posted here, I think I'm beginning to change my mind after reading this article and reflecting on my own personal experiences.

My experience is with writing backend systems in several different languages (including Python) at various volume/latency/throughput levels. I've basically worked on only two types of systems -

1. one that exposes some sort of an endpoint to the network - it accepts requests of some kind, does computation and other network requests and sends response of some kind (including long polling, ws etc).

2. reads a message from a "queue" (could be database, could be based on polling another api etc) and does computation/network calls and basically sends it to other queues.

Nothing else. Huge variance in specific requirements, but that's it. For the first type of system, latency matters more. For the second system, throughput matters more.

For the first type of system, I want to be able to spin up threads in response to requests, without worrying that an endpoint is too computationally heavy and might block others. I want to be able to share connections to databases in a shared pool. NoGIL would be useful here.

For the second type of system, I can't remember the last time where I wrote one where I had in-process parallelism/concurrency with shared resources (even in langs where there's no GIL). It would just get too confusing and hard to reason about. Any optimizations were mostly based on intelligent batching. For parallelism, you'd just have multiple _completely_ independent processes, probably across multiple machines.

I would absolutely be disappointed if NoGIL meant compromising on the quality of of the second type of system here. In practice, most of my mental bandwidth today goes towards making the second type of system better.

5 comments

> For the second type of system, I can't remember the last time where I wrote one where I had in-process parallelism/concurrency with shared resources (even in langs where there's no GIL). It would just get too confusing and hard to reason about. Any optimizations were mostly based on intelligent batching. For parallelism, you'd just have multiple _completely_ independent processes, probably across multiple machines.

For myself, the prospect of no-gil is interesting, in that something like my Captain's Log application [0] can be free from it; for example, I currently use a QThread to implement a JournalParser, which is basically the program's "engine" - the parser constantly reads in game events from a player journal file generated by the game Elite: Dangerous (and Odyssey), and depending on the particular event, fires off a related custom QSignal, which is then processed by whichever slot (receiving function) is listening for a given Signal.

There are other places in that application where no GIL might be quite handy.

In other words, I can see where having no GIL can be useful for GUI applications like mine.

[0] https://captainslog.scarygliders.net/captains-log-2/

Your JournalParser sounds like it could be implemented using normal Python threads or by an asyncio event loop without much performance problems. If I understand all it's doing is watching for events and posting a signal somewhere, so it doesn't sound like the kind of application that is CPU-bound.
I could. But I 100% take full advantage of Qt's signal and slot mechanism.

Also, in many ways, GUI applications written in Python are not so much CPU bound, but Python GIL bound. If you're writing a Python/Qt application, you have to take great care to ensure your GUI doesn't freeze when your program is performing, say, many database inserts; if you have some naive loop which performs some given operation, your nice Qt GUI will freeze right up until that operation is complete. Right now the solution is to perform such operations in, say, a QThread, and use Qt's signal/slot feature to blat a progress "report" to a handler in the `main` Python loop.

So back to what I said - no-GIL is looking quite interesting to me. Whether or not Qt can take advantage of such will be a different matter.

I agree. UIs written in python could benefit massively from noGIL - complex / computational UIs especially.
As a hobbyist who uses python I don't think I'll be directly using concurrency in my code, but I'm betting that over time the standard library and popular external libraries will.

And that will raise everyone's code.

To take advantage of NoGIL you don’t necessarily need to use parallelism directly. But let’s say your web server or async task executor can be more efficient at sharing context between threads.
The GIL is a bottleneck in applications that are CPU bound, e.g. machine learning, so naturally the NoGIL project is not that interesting to people writing server applications.

Of course, one may argue that you probably should not write CPU bound programs in Python in the first place, but that's another story :)

A lot of Java server based applications are multi threaded, not CPU bound, and tend to do a lot of things that generally can't work in python because of the GIL. It's too simplistic to think of this as something only of interest for CPU bound stuff. A lot of what Java applications do is of course the thread per connection style processing that older java applications still do (more modern ones would use non blocking IO and green threads). But there are also background threads doing useful work or more complex requests that fork off asynchronous work across multiple CPUs and then aggregate the results back as the response. Java apps tend to have vastly more threads than CPU cores. The exception is when things are CPU bound; then you want to minimize the context switching and end up with an number that is close to the number of CPU cores.

The GIL is not about the CPU but about enabling those kinds of things. With the current GIL in place it's very simple: as soon as you hit the global lock, everything stops until it is released. It doesn't matter how many CPU cores you have, they'll be idling while one of them holds the lock. There's barely any point in even trying to do that with the GIL in place. Forget about sharing data between threads. Mostly that's done via queues or databases in python. Removing the GIL will revolutionize a few things in key use cases for python:

- data processing & ETL

- event driven server systems

- machine learning and data science systems

They can all benefit from this and that's the reason a lot of people are pushing for this. The short term performance losses are not inherent to removing the GIL but just a necessary evil while the python developers deal with fixing the bottlenecks and a few decades worth of technical debt.

I/O functions (may) internally release the GIL. If the GIL becomes a bottleneck, you are not I/O bound by definition.

However, you are certainly right that not all server applications are I/O bound. I was a bit sloppy there.

>For the second type of system, I can't remember the last time where I wrote one where I had in-process parallelism/concurrency with shared resources (even in langs where there's no GIL). It would just get too confusing and hard to reason about. Any optimizations were mostly based on intelligent batching. For parallelism, you'd just have multiple _completely_ independent processes, probably across multiple machines.

Interestingly I'm working on something like this right now and do have large shared resources which meant I had to abandon using a multiprocess strategy.

I don't see why it would be confusing though, provided the shared resources are read only.

For such applications, isn’t the parallelism count usually static and limited? I think there will be good benefits for distributed system frameworks for python. Id agree.

A couple of years ago, I implemented in process parallelism for a system I was maintaining at $JOB. I was happy the system was in Go and not Python. But it was an exception to the rule in my experience.