| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by r0l1 1098 days ago

https://peps.python.org/pep-0703/

Quote: "In PyTorch, Python is commonly used to orchestrate ~8 GPUs and ~64 CPU threads, growing to 4k GPUs and 32k CPU threads for big models. While the heavy lifting is done outside of Python, the speed of GPUs makes even just the orchestration in Python not scalable. We often end up with 72 processes in place of one because of the GIL. Logging, debugging, and performance tuning are orders-of-magnitude more difficult in this regime, continuously causing lower developer productivity."

Quote: "We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers."

2 comments

miraculixx 1098 days ago

This requirement could have been well served with a gil per thread and arena based (shared) object allocation model. Every other use case would have been unaffected.

Now we change the world for everyone and put most of library developers through a valley of desperation for 5 years+, just so that a very few narrow use cases get the benefits they want.

Not a smart move IMHO.

link

r0l1 1098 days ago

Good point. Did the Meta and Deepmind devs really miss this?

I try to avoid python as much as possible, because I mainly work with Go & C++ and multi-threading with those languages is just better (imho). Bringing python a step forward and making it future proof might be a good thing... Even if this means to break some things? Not sure if dismissing the GIL is the right step, but there is a big performance gap to fix. Or maybe the AI community must move to a better suited language? Having python code in production just feels so wrong. Especially if a rewrite in another language shows the performance gap.

link

miraculixx 1097 days ago

The PEP notes subinterpreters as an alternative and says it can be considered a valid approach to achieve paralleism. However it does not discuss why nogil was given preferences. I guess that's ok because the PEP is about nogil.

I'm not sure whether the SC has considered alternative approaches but it would be surprising if not

link

samus 1098 days ago

The use cases of the ML and AI world are very important though, as they massively contribute to Python's popularity. Thanks to Python, researchers and developers don't have to use different languages and library ecosystems for developing and scaling models.

Alas, subinterpreters sound like they could be a feasible solution for many use cases as well.

link

tgv 1098 days ago

And they couldn't switch to another language? It sounds really odd to me, too odd to be a justification for this change.

link