|
|
|
|
|
by r0l1
1051 days ago
|
|
https://peps.python.org/pep-0703/ Quote: "In PyTorch, Python is commonly used to orchestrate ~8 GPUs and ~64 CPU threads, growing to 4k GPUs and 32k CPU threads for big models. While the heavy lifting is done outside of Python, the speed of GPUs makes even just the orchestration in Python not scalable. We often end up with 72 processes in place of one because of the GIL. Logging, debugging, and performance tuning are orders-of-magnitude more difficult in this regime, continuously causing lower developer productivity." Quote: "We frequently battle issues with the Python GIL at DeepMind. In many of our applications, we would like to run on the order of 50-100 threads per process. However, we often see that even with fewer than 10 threads the GIL becomes the bottleneck. To work around this problem, we sometimes use subprocesses, but in many cases the inter-process communication becomes too big of an overhead. To deal with the GIL, we usually end up translating large parts of our Python codebase into C++. This is undesirable because it makes the code less accessible to researchers." |
|
Now we change the world for everyone and put most of library developers through a valley of desperation for 5 years+, just so that a very few narrow use cases get the benefits they want.
Not a smart move IMHO.