There's no need to pretend Python has virtues which it lacks. It's not a fast language. It's fast enough for many purposes, sure, but it isn't fast, and this work is unlikely to change that. Faster, sure, and that's great.
Although true, it doesn't mean they can't improve its performance.
Working with threads is a pain in Python. If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.
Removing the GIL and refactoring some of the core will unlock levels of concurrency that are currently not feasible with Python. And that's a great deal, in my opinion. Well worth the trouble they're going through.
After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.
Where it doesn't work is in a generic worker pool where you need to put mutex locks around everything -- and then prod randomly deadlocks in ways the developer boxes can't recreate.
> After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.
This may be a case of violent agreement, but there are a few clear cases where multithreading is easily viable. The best case is some sort of parallel-for construct, even if you include parallel reductions, although there may need to be some smarts around how to do the reduction (e.g., different methods for reduce-within-thread versus reduce-across-thread). You can extend this to heterogeneous parallel computations, a general, structured fork-join form of concurrency. But in both cases, you essentially have to forbid inter-thread communication between the fork and the join parameters. There's another case you might be able to make work, where you have a thread act as an internal server that runs all requests to completion before attempting to take on more work.
What the paper you link to is pointing out, in short, is that message passing doesn't necessarily free you from the burden of shared-mutable-state-is-bad concurrency. The underlying problem is largely that communication between different threads (or even tasks within a thread) can only safely occur at a limited number of safe slots, and any communication outside of that is risky, be it an atomic RMW access, a mutex lock, or waiting on a message in a channel.
In what way? Threading, asyncio, tasks, event loops, multiprocessing, etc. are all complicated and interact poorly if at all. In other languages, these are effectively the same thing, lighter weight, and actually use multicore.
If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.
> If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.
I'm not sure that argument helps your position on threading. I once saw a java program spin off 3000 threads doing god knows what. Debugging the fucking thing was impossible.
The whole purpose of threads is to improve overall speed of execution. Unless you're working with a very small number of threads (single digits), that's a very hard to achieve goal in Python. I wouldn't count this as easy to use. It's easy to program, yes, but not easy to get working with reasonably acceptable performance.
It's not such a big pain in every language. And certainly not as hard to get working with acceptable performance in many languages.
Even if you have zero shared resources, zero mutexes, no communication whatsoever between threads, it's a huge pain in Python if you need +10-ish threads going. And many times the GIL is the bottleneck.
This is where Python's GIL bit me: I was more than familiar with how to shoot myself in the foot using threads in other languages, and careful to avoid those traps. Threads spun up only in situations where they had their own work to do and well-defined conditions for how both failure and success would be reported back to the thread that requested it, along with a pool that wouldn't exceed available resources.
Like every other language I've used this approach with, nothing bad happened - the program ran as expected and produced correct results. Unlike every other language, spreading calculations across multiple cores didn't appreciably improve performance. In some cases, it got slower.
Eventually scrapped it all, and went with an approach closer to what I'd have done with C and fork() decades ago... Which, to Python's credit, was fairly painless and worked well. But it caught me off-guard, because with asyncio for IO-bound stuff, it didn't seem like threads really have much of a purpose in Python, other than to be a tripwire for unwary and overconfident folks like myself!
> If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.
as you know thats mostly threads in general. Any optimisation has a drawback so you need to choose wisely.
I once made a horror of a thing that synced S3 with another S3, but not quite object store. I needed to move millions of files, but on the S3 like store every metadata operation took 3 seconds.
So I started with async (pro tip: its never a good idea to use async. its basically gotos with two dimensions of surprise: 1 when the function returns, 2 when you get an exception ) I then moved to threads, which got a tiny bit extra performance, but much easier debugability. Then I moved to multiprocess pools of threads (fuck yeah super fast) but then I started hitting network IO limits.
So then I busted out to airflow like system with operators spawning 10 processes with 500 threads.
it wasnt very memory efficient, but it moved many thousands of files a second.
This is entirely fair, and I wish I'd been a little less grumpy in my initial reply (I assign some blame to just getting over an illness). Thank you for the gentle correction!
That said - I think it's fair to be irritated by people who write Python off as entirely useless because it is not _the fastest_ language. As you rightly say - it's fast enough for many purposes. It does bother me to see Python immediately counted out of discussions because of its speed when the app in question is extremely insensitive to speed.
I have been on teams where Python based approaches were discounted due to “speed” and “industry best practice” and then had the very same engineers create programs that are slow by design in a “fast” language and introduce needless complexity (and bugs) through “faster” database processes.
Like you said, it’s the thoughtless criticism. The meme. I am happy for Python to lose in a design analysis because it’s too slow for what we are building; I am loathe to let it lose because whoever is doing the analysis with me has heard it’s slow.
Which is to say, I get what you’re saying. I think people have been a little ungenerous with your comment.
> I think people have been a little ungenerous with your comment.
Eh - I engaged with a fraught topic in a snarky way without clarifying that I meant the unintuitive-but-technically-literally-accurate interpretation of my words. Maybe some people have been less-generous than they could have been, but I don't begrudge it - if I look sufficiently like a troll, I won't complain when I get treated like one. Not everyone has the time and mental fortitude to treat everyone online with infinite patience and kindness - I know I sure don't.
In some ways the weakness even was a virtue. Because Python threads are slow Python has incredible toolsets for multiprocess communication, task queues, job systems, etc.
Maybe it'll shut up "architects" who hack up a toy example in <new fast language hotness>, drop it on a team to add all the actual features, tests, deployment strategy, and maintain, and fly away to swoop and poop on someone else. Gee thanks for your insight; this API serves maybe 1 request a second, tops. Glad we optimized for SPEEEEEED of service over speed of development.
You seem to be implying that there is something inherently slow to Python. What?
This topic is an example: a detail of one particular implementation, since GIL is definitely not inherent to the language. Just the usual worry about looseness of types?
There are worse hills to die on than this. But the Python ecosystem is very slow. It's a cultural thing.
The biggest impact would be completely redoing package discovery. Not in some straightforward sense of "what if PyPi showed you a Performance Measurement?" No, that's symptomatic of the same problem: harebrained and simplistic stuff for the masses.
But who's going to get rid of PyPi? Conda tried and it sucks, it doesn't change anything fundamental, they're too small and poor to matter.
Meta should run its own package index and focus on setuptools. This is a decision PyTorch has already taken, maybe the most exciting package in Python today, and for all the headaches that decision causes, look: torch "won," it is high performance Python with a vibrant high performance ecosystem.
These same problems exist in NPM too. It isn't an engineering or language problem. Poetry and Conda are not solutions, they're symptoms. There are already too many ideas. The ecosystem already has too much manic energy spread way too thinly.
Golang has "fixed" this problem as well as it could for non-commercial communities.
The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.
> The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.
The "& derivatives" part is the problem! Torch does not have derivatives. It won. You just use it and its extensions, and you're done. That is what people use to do exciting stuff in Python.
It's the manic developers writing manic derivatives that make the Python ecosystem shitty. I mean I hate ragging on those guys, because they're really nice people who care a lot about X, but if only they could focus all their energy to work together! Python has like 20 ideas for accelerated computing. They all abruptly stopped mattering because of Torch. If the numba and numpy and scikit-learn and polars and pandas and... all those people, if they would focus on working on one package together, instead of reinventing the same thing over and over again - high level cross compilers or an HPC DSL or whatever, the ecosystem would be so much nicer and performance would be better.
This idea that it's a million little ideas incubating and flourishing, it's cheerful and aesthetically pleasing but it isn't the truth. CUDA has been around for a long time, and it was obviously the fastest per dollar & watt HPC approach throughout its whole lifetime, so most of those little flourishing ideas were DOA. They should have all focused on Torch from the beginning instead of getting caught up in little manic compiler projects. We have enough compilers and languages and DSLs. I don't want another DataFrame DSL!
I see this in new, influential Python projects made even now, in 2024. Library authors are always, constantly, reinventing the wheel because the development is driven by one person's manic energy more than anything else. Just go on GitHub and look how many packages are written by one person. GitHub & Git, PyPi are just not adequate ways to coordinate the energies of these manic developers on a single valuable task. They don't merge PRs, they stake out pleasing names on PyPi, and they complain relentlessly about other people's stuff. It's NIH syndrome on the 1m+ repository scale.
I've read that it can't even be as fast as JS, because everything is monkey-patchable at runtime. Maybe they can optimize for that when it doesn't happen, but remains to be seen.
Python is probably much more monkey patchable. Almost any monkey patching that JavaScript supports also works in Python (e.g. modifying class prototype = assigning class methods), but there are a few things that only Python can do: accessing local variables as dict, access other stack frames, modifying function bytecode, read/write closure variables, patching builtins can change how the language works (__import__, __build_class__). Many of them can make a language hard to optimize.
You can always use optimistic optimization strategies where you profile the fast path and optimize that. When someone does something slow, you tell them to stop doing it if they want better performance.
In any case, that should be irrelevant to getting a reasonably performant JIT running. Lots of AOT and JIT compiled languages have robust FFI functionality.
The native extensions are more relevant when we talk about removing the GIL, since lots of Python code may call into non thread safe C extension code.
Python is inherently slow. That’s why people tend to rewrite bits that need high performance in C/C++. Removing the GIL is a massively welcome change, but it isn’t going to make C extensions go away.
Working with threads is a pain in Python. If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.
Removing the GIL and refactoring some of the core will unlock levels of concurrency that are currently not feasible with Python. And that's a great deal, in my opinion. Well worth the trouble they're going through.