| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by quietbritishjim 1334 days ago

For me it's not about efficiency. Using asyncio is just easier than threads.

* One coroutine can only interrupt another one at a point clearly marked with await (or async for or async with). That makes it easier to avoid data races without explicit synchronisation like locks.

* It's much easier to spawn async tasks and avoid them getting lost than with threads, assuming you use asyncio task groups (either by using a future version of Python, or using the anyio library now, or using Trio instead of asyncio).

* Async operations all have first class support for cancellation, and this interacts really cleanly with task groups. That helps with things like time outs, clean shutdown of your program, or cleaning up all resources related to a connection when that connection is closed.

* There's a bit more boilerplate in spawning threads and exchanging messages with them and joining them than the is spawning async tasks, especially when using task groups. (Admittedly, this is a solvable problem, and there are probably good libraries out there to help with this.)

1 comments

zzzeek 1334 days ago

well that's the thing with threads, you shouldn't be "spinning them up on the fly", you should have a fixed pool of threads. That then involves some architectural work up front (like 5 lines of code, ugh) and that's where everyone (under age 40) yawns and goes off to use asyncio instead (which oddly enough has a worker thread running in the form of the event loop, it's just all been presented nicely).

link

quietbritishjim 1334 days ago

It sounds like you're talking about a different situation. The parent comment was taking about thread-per-connection with blocking IO read call on each. Yes that means spinning up and shutting down threads as connections open and close, and that is 100% a valid strategy. If you have a fixed pool of n threads and you get n+1 connections then you're just going to have to ignore one at any given time (potentially causing deadlock depending on the relationship between the connections) or end up using a multiplexing API at which point you're not far off from async world anyway.

Maybe you're talking about just submitting independent work items to run concurrently – yes async won't help much with that, because you're in the most trivial situation possible.

In more complex situations, with interrelationships between tasks (/threads), async syntax and task groups definitely has a huge impact. And, as I said, that's before you even get into how much easier it makes cancellation.

link

zzzeek 1334 days ago

yes, parent was referring to "each thread with a blocking connection", but you still can (and probably should) use a thread pool for that. In the naive approach, new connections beyond the limit of your threadpool either have to wait, or you have to dynamically expand your threadpool. mod_wsgi's daemon mode has the option to use a thread pool of a fixed size to handle requests.

you can also use non-blocking handles with a fixed /dynamic threadpool and use epoll or similar to find those handles with data ready, and send those into your pool, thereby servicing an arbitrary number of connections with a controlled level of concurrency among them. MariaDB has an option to do that here: https://mariadb.com/kb/en/thread-pool-in-mariadb/ . this is not as trivial as spinning up asyncio tasks but that's because there's (AFAIK) no friendly library giving you an easy way of doing it. But it's Python and if you're writing a server to handle MariaDB server loads using a thread pool with direct use of epoll(), you're likely in the wrong language.

link