| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by greyman 2872 days ago
	Python is my language of first choice, but I must say that I am not that thrilled how this multithreading ended up. There are many tutorials about the topic promising to explain how it works, usually in the form of "simple introduction". But when one tries to implement something production-ready, with correct error handling etc., things starts to complicate pretty quickly; at least that was my experience. I don't want to accuse anyone specifically, but most of the tutorials I saw seems to portrait it in a way, that it looks easier than it actually is. Ultimately, my company decided, that instead of fighting with asyncio, certain projects will switch to Go.

4 comments

sametmax 2872 days ago

That's because most of those tutorials have not been written by somebody actually putting something in production.

I've been using asyncio for a while now, and you can't get away with a short introduction since:

- it's very low level

- it's full of design flaws and already has accumulated technical debt

- it requires very specific best practices to be usable

I'm not going to write a tutorial here, it would take me a few days to make a proper one, but a few pointers nobody tells you:

- asyncio solves one problem, and one problem only: when the bottleneck of your program is network IO. It's a very small domain. Most programs don't need asyncio at all. Actually many programs with a lot of network IO don't have performance problems, and hence don't need asyncio. Don't use asyncio if you don't need it: it adds complexity that is worth it only if it solves your problem.

- asyncio is mostly very low level. Unless you code your own lib or framework with it, you probably don't want to use it directly. E.G: if you want to make http requests, use aiohttp.

- use asyncio.run_until_complete(), not asyncio.run_forever(). The former will crash on any exception, making debugging easy. The later will just display the stack trace in the console.

- talking about easy debugging, activate the various debug features when not in prod (https://docs.python.org/3/library/asyncio-dev.html#debug-mod...). Too many people code with asyncio in the dark, and don't know there are plenty of debug info available.

- await is just a way to inline a callback. When you do "await", you say 'do the stuff', and any lines of code that are after "await" are called when "await" is done. You can run asynchronous things without "await". "await" is just useful if you want 2 asynchronous things to happen one __after__ another. Hence, don't use it if you wants 2 asynchronous things to progress in parallel.

- if you want to run one asynchronous thing, but not "await" it, call "asyncio.ensure_future()".

- errors in "await" can be just caught with try/except. If you used ensure_future() and no "await", you'll have to attach a callback with "add_done_callback()" and check manually if the future has an exception. Yes, it sucks.

- if you want to run one blocking thing, call "loop.run_in_executor()". Careful, the signature is weird.

- CPU intensive code blocks the event loop. loop.run_in_executor() use threads by default, hence it doesn't protect you from that. If you have CPU intensive code, like zipping a lot of files or calculating your own precious fibonacci, create a "ProcessPoolExecutor" and use run_in_executor() with it.

- don't use asyncio before Python 3.5.3. There is a incredibly major bug with "asyncio.get_event_loop()" that makes it unusable for anything that involve mixing threads and loops. Yep. Not a joke.

- but really use 3.6. TCP_NODELAY is on by default and you have f-string anyway.

- don't pass the loop around. Use asyncio.get_event_loop(). This way your code will be independent of the loop creation process.

- you do pretty much nothing yourself in asyncio. Any async magic is deep, deep down the lib. What you do is define coroutines calling the magic things with ensure_future() and await. Pretty much nothing in your own code is doing IO, it's just asking the asyncio code to do IO in a certain order.

- you see people in tutorials simulate IO by doing "asyncio.sleep()". It's because it's the easiest way to make the event loop switch context without using the network. It doesn't mean anything, it just pauses and switch, but if you see that in a tutorial, you can mentally replace it with, say, an http call, to get a more realistic picture.

- asyncio comes with a lot of concepts, let's take a time to define them:

    * Future: an object with a thing to execute, with potentially some callbacks to be called after it's executed.
    
    * Task: a subclass of future. The thing to execute is a coroutine,, and the coroutine is immediately scheduled in the event loop when the task is instantiated. When you do ensure_future(coroutine), it returns a Task.

    * coroutine: a generator with some syntaxic sugar. Honestly that's pretty much it. They don't do much by themself, except you can use await in them, which is handy. You get one by calling a coroutine function.

    * coroutine function: a function declared with "async def". When you call it, it doesn't run the code of the function. Instead, it returns a coroutine. 

    * awaitable: any object with an __await__ method. This method is what the event loop uses to execute asynchronously the code. coroutines, tasks and futures are awaitables. Now the dirty secret is this: you can write an __await__ method, but in it, you will mostly call the __await__ from some magical object from deep inside asyncio. Unless you write a framework, don't think too much about it: awaitable = stuff you can pass to ensure_future() to tell the event loop to run it. Also, you can "await" any awaitable.

    * event loop: the magic "while True" loop that takes awaitables, and execute them. When the code hits "await", the event loop switch from one awaitable to another, and then go back to it later.

    * executor: an object that takes code, execute it in a __different__ context, and return a future you can await in your __current__ context. You will use them to run stuff in threads or separate processes, but magically await the result in your current code like it's regular asyncio. It's very handy to naturally integrate blocking code in your workflow.

    * event loop policy: the stuff that creates the loop. You can override that if you are writing a framework and wants to get fancy with the loop. Don't do it. I've done it. Don't.

    * task factory: the stuff that creates the tasks. You can override that if you are writing a framework and wants to get fancy with the tasks. Don't do it either.

    * protocols: abstract class you can implement to tell asyncio __what__ to do when it establish/loose a connection or send/receive a packet. asyncio instantiate one protocol for each connection. Problem is: you can't use "await" in protocols, only old fashion callback.

    * transports: abstract class you can implement to tell asyncio __how__ to establish/loose a connection or send/receive a packet.

Now, I'm putting the last point separately because if there is one thing you need to remember it's this. It's the most underrated secret rules of asyncio. The stuff that is literally written nowhere ever, not in the doc, not in any tuto, etc.

asyncio.gather() is the most important function in asyncio ===========================================================

You see, everytime you do asyncio.ensure_future() or loop.run_in_executor(), you actually do the equivalent of a GO TO. (see: https://vorpus.org/blog/notes-on-structured-concurrency-or-g...)

You have no freaking idea of when the code will start or end execution.

To stay sane, you should never, ever, have an dangling awaitable anywhere. Always get a reference on all your awaitables. Decide where in the code you think their life should end.

And at this very point, call asyncio.gather(). It will block until all awaitables are done.

E.G, don't:

    asyncio.ensure_future(bar())
    asyncio.get_event_loop().run_in_executor(None, barz)
    await asyncio.sleep(10)

E.G, do:

    foo = asyncio.ensure_future(bar())
    fooz = asyncio.get_event_loop().run_in_executor(None, barz)
    await asyncio.sleep(10)
    await asyncio.gather(foo, fooz)  # this is The Only True Way

Your code should be a meticulous tree of hierarchical calls to asyncio.gather() that delimitates where things are supposed to stop. And if you think it's annoying, wait for debugging something which life cycle you don't have control over.

Of course it's getting old pretty fast, so you may want to write some abstraction layer such as https://github.com/Tygs/ayo. But I wouldn't use this one in production just yet.