Hacker News new | ask | show | jobs
by kstrauser 405 days ago
I haven’t been using it that much longer than you, and I agree with most of what you’re saying, but I’d characterize it differently.

Python has a lot of solid workarounds for avoid threading because until now Python threading has absolutely sucked. I had naively tried to use it to make a CPU-bound workload twice as fast and soon realized the implications of the GIL, so I threw all that code away and made it multiprocessing instead. That sucked in its own way because I had to serialize lots of large data structures to pass around, so 2x the cores got me about 1.5x the speed and a warmer server room.

I would love to have good threading support in Python. It’s not always the right solution, but there are a lot of circumstances where it’d be absolutely peachy, and today we’re faking our way around its absence with whole playbooks of alternative approaches to avoid the elephant in the room.

But yes, use async when it makes sense. It’s a thing of beauty. (Yes, Glyph, we hear the “I told you so!” You were right.)

2 comments

> That sucked in its own way because I had to serialize lots of large data structures to pass around, so 2x the cores got me about 1.5x the speed and a warmer server room.

In many cases you can't reasonably expect better than that (https://en.wikipedia.org/wiki/Amdahl's_law). If your algorithm involves sharing "large data structures" in the first place, that's a bad sign.

That's true, but you can sometimes get a whole lot closer if you can share state between threads. Sometimes you can't help the size of the data. Maybe you have a thread reading frames from a video and passing them to workers for analysis. You might get crazy IO contention if you pass around "foo.vid;frame222" and "foo.vid;frame223" to the workers and make them retrieve that data themselves.

There may be another way to skin that specific cat. My point isn't to solve one specific problem, but to say that some problems are just inherently large. And with Python, today, if those workers are CPU-bound in Python-land, that means running separate processes and passing large hunks of state around (or shoving it through SHM; same idea, just a different way of passing state).

I find python's async to be lacking in fine grained control. It may be fine for 95% of simple use cases, but lacks advanced features such as sequential constraining, task queue memory management, task pre-emption etc. The async keword also tends to bubble up through codebases in aweful ways, making it almost impossible to create reasonably decoupled code.