Hacker News new | ask | show | jobs
by js2 1334 days ago
I've been coding Python since 2.5 days and I have yet to have a use case where I've really needed asyncio. For client-side code, concurrent.futures (specifically ThreadPoolExecutor) has satisfied nearly every use case, though occasionally I'll use a a worker-thread model.

For server-side code, I'd still probably use threads up to maybe 1000 concurrent connections. Beyond that, I've used gevent to good effect. e.g., I have a server that receives HTTP POSTs which are multipart forms, the form having 3 parts, a JSON part and two file parts. The two files parts get written to files on S3 and the JSON part to SQS. The web framework is Falcon[1] and I also made use of a Cython-based HTTP form parser[2]. Concurrency is handled via gevent. Openresty sits in front and invokes the Python server via uwsgi. At the time I developed it, asyncio was not yet mature and not supported by boto3. I benchmarked against pypy but unsurprisingly (since it's I/O bound) got better performance and from CPython + gevent.

If I were developing it from scratch today, I'd re-evaluate the asyncio story, or more likely than not, choose a different language.

I don't doubt that there's use-cases to which asyncio is well-suited and the right choice, but I suspect folks may be using it in cases where they'd be fine with threads. As always, there are trade-offs.

1. https://falconframework.org/

2. https://pypi.org/project/streaming-form-data/ (I think)

2 comments

For me it's not about efficiency. Using asyncio is just easier than threads.

* One coroutine can only interrupt another one at a point clearly marked with await (or async for or async with). That makes it easier to avoid data races without explicit synchronisation like locks.

* It's much easier to spawn async tasks and avoid them getting lost than with threads, assuming you use asyncio task groups (either by using a future version of Python, or using the anyio library now, or using Trio instead of asyncio).

* Async operations all have first class support for cancellation, and this interacts really cleanly with task groups. That helps with things like time outs, clean shutdown of your program, or cleaning up all resources related to a connection when that connection is closed.

* There's a bit more boilerplate in spawning threads and exchanging messages with them and joining them than the is spawning async tasks, especially when using task groups. (Admittedly, this is a solvable problem, and there are probably good libraries out there to help with this.)

well that's the thing with threads, you shouldn't be "spinning them up on the fly", you should have a fixed pool of threads. That then involves some architectural work up front (like 5 lines of code, ugh) and that's where everyone (under age 40) yawns and goes off to use asyncio instead (which oddly enough has a worker thread running in the form of the event loop, it's just all been presented nicely).
It sounds like you're talking about a different situation. The parent comment was taking about thread-per-connection with blocking IO read call on each. Yes that means spinning up and shutting down threads as connections open and close, and that is 100% a valid strategy. If you have a fixed pool of n threads and you get n+1 connections then you're just going to have to ignore one at any given time (potentially causing deadlock depending on the relationship between the connections) or end up using a multiplexing API at which point you're not far off from async world anyway.

Maybe you're talking about just submitting independent work items to run concurrently – yes async won't help much with that, because you're in the most trivial situation possible.

In more complex situations, with interrelationships between tasks (/threads), async syntax and task groups definitely has a huge impact. And, as I said, that's before you even get into how much easier it makes cancellation.

yes, parent was referring to "each thread with a blocking connection", but you still can (and probably should) use a thread pool for that. In the naive approach, new connections beyond the limit of your threadpool either have to wait, or you have to dynamically expand your threadpool. mod_wsgi's daemon mode has the option to use a thread pool of a fixed size to handle requests.

you can also use non-blocking handles with a fixed /dynamic threadpool and use epoll or similar to find those handles with data ready, and send those into your pool, thereby servicing an arbitrary number of connections with a controlled level of concurrency among them. MariaDB has an option to do that here: https://mariadb.com/kb/en/thread-pool-in-mariadb/ . this is not as trivial as spinning up asyncio tasks but that's because there's (AFAIK) no friendly library giving you an easy way of doing it. But it's Python and if you're writing a server to handle MariaDB server loads using a thread pool with direct use of epoll(), you're likely in the wrong language.

in my largely non-scientific experience, Python starts to fall over at about 50 threads, 1000 seems impossible but I haven't really tried.
1000 threads was a very specific use case I probably shouldn't have generalized from where I needed to match the number of Python threads running in a web server to a Java process running on the same host using the same number of threads. They were mostly idle.

There's no reason Python should fall over at any number of threads. You just usually end up either running out of memory or (more likely) saturate a single CPU core well before that number of threads.

Without consulting my notes I can't recall why I didn't use gevent on that project.