| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sametmax 2877 days ago

That's because most of those tutorials have not been written by somebody actually putting something in production.

I've been using asyncio for a while now, and you can't get away with a short introduction since:

- it's very low level

- it's full of design flaws and already has accumulated technical debt

- it requires very specific best practices to be usable

I'm not going to write a tutorial here, it would take me a few days to make a proper one, but a few pointers nobody tells you:

- asyncio solves one problem, and one problem only: when the bottleneck of your program is network IO. It's a very small domain. Most programs don't need asyncio at all. Actually many programs with a lot of network IO don't have performance problems, and hence don't need asyncio. Don't use asyncio if you don't need it: it adds complexity that is worth it only if it solves your problem.

- asyncio is mostly very low level. Unless you code your own lib or framework with it, you probably don't want to use it directly. E.G: if you want to make http requests, use aiohttp.

- use asyncio.run_until_complete(), not asyncio.run_forever(). The former will crash on any exception, making debugging easy. The later will just display the stack trace in the console.

- talking about easy debugging, activate the various debug features when not in prod (https://docs.python.org/3/library/asyncio-dev.html#debug-mod...). Too many people code with asyncio in the dark, and don't know there are plenty of debug info available.

- await is just a way to inline a callback. When you do "await", you say 'do the stuff', and any lines of code that are after "await" are called when "await" is done. You can run asynchronous things without "await". "await" is just useful if you want 2 asynchronous things to happen one __after__ another. Hence, don't use it if you wants 2 asynchronous things to progress in parallel.

- if you want to run one asynchronous thing, but not "await" it, call "asyncio.ensure_future()".

- errors in "await" can be just caught with try/except. If you used ensure_future() and no "await", you'll have to attach a callback with "add_done_callback()" and check manually if the future has an exception. Yes, it sucks.

- if you want to run one blocking thing, call "loop.run_in_executor()". Careful, the signature is weird.

- CPU intensive code blocks the event loop. loop.run_in_executor() use threads by default, hence it doesn't protect you from that. If you have CPU intensive code, like zipping a lot of files or calculating your own precious fibonacci, create a "ProcessPoolExecutor" and use run_in_executor() with it.

- don't use asyncio before Python 3.5.3. There is a incredibly major bug with "asyncio.get_event_loop()" that makes it unusable for anything that involve mixing threads and loops. Yep. Not a joke.

- but really use 3.6. TCP_NODELAY is on by default and you have f-string anyway.

- don't pass the loop around. Use asyncio.get_event_loop(). This way your code will be independent of the loop creation process.

- you do pretty much nothing yourself in asyncio. Any async magic is deep, deep down the lib. What you do is define coroutines calling the magic things with ensure_future() and await. Pretty much nothing in your own code is doing IO, it's just asking the asyncio code to do IO in a certain order.

- you see people in tutorials simulate IO by doing "asyncio.sleep()". It's because it's the easiest way to make the event loop switch context without using the network. It doesn't mean anything, it just pauses and switch, but if you see that in a tutorial, you can mentally replace it with, say, an http call, to get a more realistic picture.

- asyncio comes with a lot of concepts, let's take a time to define them:

    * Future: an object with a thing to execute, with potentially some callbacks to be called after it's executed.
    
    * Task: a subclass of future. The thing to execute is a coroutine,, and the coroutine is immediately scheduled in the event loop when the task is instantiated. When you do ensure_future(coroutine), it returns a Task.

    * coroutine: a generator with some syntaxic sugar. Honestly that's pretty much it. They don't do much by themself, except you can use await in them, which is handy. You get one by calling a coroutine function.

    * coroutine function: a function declared with "async def". When you call it, it doesn't run the code of the function. Instead, it returns a coroutine. 

    * awaitable: any object with an __await__ method. This method is what the event loop uses to execute asynchronously the code. coroutines, tasks and futures are awaitables. Now the dirty secret is this: you can write an __await__ method, but in it, you will mostly call the __await__ from some magical object from deep inside asyncio. Unless you write a framework, don't think too much about it: awaitable = stuff you can pass to ensure_future() to tell the event loop to run it. Also, you can "await" any awaitable.

    * event loop: the magic "while True" loop that takes awaitables, and execute them. When the code hits "await", the event loop switch from one awaitable to another, and then go back to it later.

    * executor: an object that takes code, execute it in a __different__ context, and return a future you can await in your __current__ context. You will use them to run stuff in threads or separate processes, but magically await the result in your current code like it's regular asyncio. It's very handy to naturally integrate blocking code in your workflow.

    * event loop policy: the stuff that creates the loop. You can override that if you are writing a framework and wants to get fancy with the loop. Don't do it. I've done it. Don't.

    * task factory: the stuff that creates the tasks. You can override that if you are writing a framework and wants to get fancy with the tasks. Don't do it either.

    * protocols: abstract class you can implement to tell asyncio __what__ to do when it establish/loose a connection or send/receive a packet. asyncio instantiate one protocol for each connection. Problem is: you can't use "await" in protocols, only old fashion callback.

    * transports: abstract class you can implement to tell asyncio __how__ to establish/loose a connection or send/receive a packet.

Now, I'm putting the last point separately because if there is one thing you need to remember it's this. It's the most underrated secret rules of asyncio. The stuff that is literally written nowhere ever, not in the doc, not in any tuto, etc.

asyncio.gather() is the most important function in asyncio ===========================================================

You see, everytime you do asyncio.ensure_future() or loop.run_in_executor(), you actually do the equivalent of a GO TO. (see: https://vorpus.org/blog/notes-on-structured-concurrency-or-g...)

You have no freaking idea of when the code will start or end execution.

To stay sane, you should never, ever, have an dangling awaitable anywhere. Always get a reference on all your awaitables. Decide where in the code you think their life should end.

And at this very point, call asyncio.gather(). It will block until all awaitables are done.

E.G, don't:

    asyncio.ensure_future(bar())
    asyncio.get_event_loop().run_in_executor(None, barz)
    await asyncio.sleep(10)

E.G, do:

    foo = asyncio.ensure_future(bar())
    fooz = asyncio.get_event_loop().run_in_executor(None, barz)
    await asyncio.sleep(10)
    await asyncio.gather(foo, fooz)  # this is The Only True Way

Your code should be a meticulous tree of hierarchical calls to asyncio.gather() that delimitates where things are supposed to stop. And if you think it's annoying, wait for debugging something which life cycle you don't have control over.

Of course it's getting old pretty fast, so you may want to write some abstraction layer such as https://github.com/Tygs/ayo. But I wouldn't use this one in production just yet.

6 comments

mehrdadn 2877 days ago

Awesome comment. One thing I want to point out to those reading is that the nursery thing is an instantiation of the more general principle of, if you're finding your code is getting convoluted, it's likely that you're missing a noun. I can't explain this as well as others have, so see this comment: https://news.ycombinator.com/item?id=16468796

link

sametmax 2877 days ago

I just love this comment.

I'm going to steal it for my next training on how to design an API.

When you are a computer scientist, you want to think about your data structures so badly first. It fits your brain so well, and it's easier to understand a program from them than the rest of the code.

But it's a trap.

link

mehrdadn 2877 days ago

Yes, PLEASE do!! :) I've been dying myself to get chances to teach these kinds of ideas! Hardly anyone seems to teach this kind of thoughtful analysis. Eric Lippert deserves an enormous amount of credit for writing this series in particular -- trying to explain these ideas coherently has been a massive struggle for me, let alone trickling them down to a small example that's easy to digest. He's a really awesome guy I look up to... I've learned so much from his writing (this is only one example of many).

link

mtrovo 2877 days ago

Wow, really nice list, I wish I knew it before I started to work with asyncio.

> stay sane, you should never, ever, have an dangling awaitable anywhere. Always get a reference on all your awaitables. Decide where in the code you think their life should end.

This is the most difficult part for me, it's not trivial to know if a function you're calling is async or not without looking at the function source, specially when you're using external libraries. Also by default there's no logs about this kind of situation so it's a easy way to shoot yourself in the foot and waste 10 minutes debugging to find a dangling awaitable on a function call you didn't realize was async.

link

wruza 2877 days ago

And still people vote for async-await because “true light threads are hard to implement at low level”. This generator-based madness has to end, but few seem to understand what hassle it brings to their coding and what an alternative could be. I don’t get it.

link

sametmax 2877 days ago

That's why you should activate the debug features I mentioned. It will write in the console when you are calling an async thingy without getting the result from it.

link

takeda 2877 days ago

Anything that's defined as "async def" and that you call with await and friends should be async.

Yes, it's possible to write coroutine and use "async def" without any await inside, but in those cases the library authors should just made it a normal function.

I would say that this is a bug in the library.

link

multani 2877 days ago

> - don't pass the loop around. Use asyncio.get_event_loop(). This way your code will be independent of the loop creation process.

Eh. I've been passing the loop around as an optional kw argument in most of my code...

The idea was for the code not to depend on a global somewhere (I hate globals) and to "be sure" the loop used among all the code was the same, unless explicitly passed. Of course I never used that "feature". I thought I read this somewhere when I was looking up at Twisted and they were saying to pass it explicitly, but I'm not so sure now...

link

takeda 2877 days ago

You supposed to have only a single event loop per thread, the standard event loop policy ensures that the value is thread local (you can change that by modifying the policy), unless you're doing something unusual with multiple loops in the same thread you will never need to pass the value.

Also if you are passing the loop and are doing multi threading, you need to be careful, because if you pass it to another thread you might see weird issues.

I initially also started explicitly pass loop around but once decided to combine asyncio with threads I realized that it is better to trust get_event_loop() to do the job correctly. The only exception is when I need to schedule coroutine in one thread for another thread. In that case I need loop from a different thread so I can invoke call_soon_threadsafe().

link

stinos 2877 days ago

one problem only: when the bottleneck of your program is network IO

Do you mean literally what this says, or are you rather using 'network IO' as some (extremely) abstract term for any type of communication? Just checking because I haven't used asynchronous programming in Python but did so in other languages and we do things like await hardwareAxis.GoToTargetPosition(position=50, veolcity=100). Not what most people think of when reading network IO, that one.

link

sametmax 2877 days ago

While async / await, futures, and the event loop are generic mechanisms, the asyncio module itself only implement selectors for sockets (ok, and subprocess pipes). You can't even do async file system operations with it: you need to call run_in_executor().

Now that doesn't mean you could not implement a selector that does asynchronous UI IO and plug it to the event loop. But the asyncio module doesn't provide it right now, and no lib that I know of does it either.

link

takeda 2877 days ago

Good information, but it all depends on use case, for example I use a lot of await and "async with" in coroutines.

Then start tasks as:

    tasks = [coroutine(i) for i in parameters]

and then iterate over results using

    for task in asyncio.as_completed(tasks)

You can also start threads and then dispatch coroutines to them.

There are many ways of using it.

link

metalliqaz 2877 days ago

I wish I could favorite comments on HN

link

chatmasta 2877 days ago

You can. Click on the timestamp and then favorite it.

link