|
|
|
|
|
by greyman
2872 days ago
|
|
Python is my language of first choice, but I must say that I am not that thrilled how this multithreading ended up. There are many tutorials about the topic promising to explain how it works, usually in the form of "simple introduction". But when one tries to implement something production-ready, with correct error handling etc., things starts to complicate pretty quickly; at least that was my experience. I don't want to accuse anyone specifically, but most of the tutorials I saw seems to portrait it in a way, that it looks easier than it actually is. Ultimately, my company decided, that instead of fighting with asyncio, certain projects will switch to Go. |
|
I've been using asyncio for a while now, and you can't get away with a short introduction since:
- it's very low level
- it's full of design flaws and already has accumulated technical debt
- it requires very specific best practices to be usable
I'm not going to write a tutorial here, it would take me a few days to make a proper one, but a few pointers nobody tells you:
- asyncio solves one problem, and one problem only: when the bottleneck of your program is network IO. It's a very small domain. Most programs don't need asyncio at all. Actually many programs with a lot of network IO don't have performance problems, and hence don't need asyncio. Don't use asyncio if you don't need it: it adds complexity that is worth it only if it solves your problem.
- asyncio is mostly very low level. Unless you code your own lib or framework with it, you probably don't want to use it directly. E.G: if you want to make http requests, use aiohttp.
- use asyncio.run_until_complete(), not asyncio.run_forever(). The former will crash on any exception, making debugging easy. The later will just display the stack trace in the console.
- talking about easy debugging, activate the various debug features when not in prod (https://docs.python.org/3/library/asyncio-dev.html#debug-mod...). Too many people code with asyncio in the dark, and don't know there are plenty of debug info available.
- await is just a way to inline a callback. When you do "await", you say 'do the stuff', and any lines of code that are after "await" are called when "await" is done. You can run asynchronous things without "await". "await" is just useful if you want 2 asynchronous things to happen one __after__ another. Hence, don't use it if you wants 2 asynchronous things to progress in parallel.
- if you want to run one asynchronous thing, but not "await" it, call "asyncio.ensure_future()".
- errors in "await" can be just caught with try/except. If you used ensure_future() and no "await", you'll have to attach a callback with "add_done_callback()" and check manually if the future has an exception. Yes, it sucks.
- if you want to run one blocking thing, call "loop.run_in_executor()". Careful, the signature is weird.
- CPU intensive code blocks the event loop. loop.run_in_executor() use threads by default, hence it doesn't protect you from that. If you have CPU intensive code, like zipping a lot of files or calculating your own precious fibonacci, create a "ProcessPoolExecutor" and use run_in_executor() with it.
- don't use asyncio before Python 3.5.3. There is a incredibly major bug with "asyncio.get_event_loop()" that makes it unusable for anything that involve mixing threads and loops. Yep. Not a joke.
- but really use 3.6. TCP_NODELAY is on by default and you have f-string anyway.
- don't pass the loop around. Use asyncio.get_event_loop(). This way your code will be independent of the loop creation process.
- you do pretty much nothing yourself in asyncio. Any async magic is deep, deep down the lib. What you do is define coroutines calling the magic things with ensure_future() and await. Pretty much nothing in your own code is doing IO, it's just asking the asyncio code to do IO in a certain order.
- you see people in tutorials simulate IO by doing "asyncio.sleep()". It's because it's the easiest way to make the event loop switch context without using the network. It doesn't mean anything, it just pauses and switch, but if you see that in a tutorial, you can mentally replace it with, say, an http call, to get a more realistic picture.
- asyncio comes with a lot of concepts, let's take a time to define them:
Now, I'm putting the last point separately because if there is one thing you need to remember it's this. It's the most underrated secret rules of asyncio. The stuff that is literally written nowhere ever, not in the doc, not in any tuto, etc.asyncio.gather() is the most important function in asyncio ===========================================================
You see, everytime you do asyncio.ensure_future() or loop.run_in_executor(), you actually do the equivalent of a GO TO. (see: https://vorpus.org/blog/notes-on-structured-concurrency-or-g...)
You have no freaking idea of when the code will start or end execution.
To stay sane, you should never, ever, have an dangling awaitable anywhere. Always get a reference on all your awaitables. Decide where in the code you think their life should end.
And at this very point, call asyncio.gather(). It will block until all awaitables are done.
E.G, don't:
E.G, do: Your code should be a meticulous tree of hierarchical calls to asyncio.gather() that delimitates where things are supposed to stop. And if you think it's annoying, wait for debugging something which life cycle you don't have control over.Of course it's getting old pretty fast, so you may want to write some abstraction layer such as https://github.com/Tygs/ayo. But I wouldn't use this one in production just yet.