Plain sanic runs much faster than the uvicorn-ASGI-sanic stack used in the benchmark, and the ASGI API in the middle is probably degrading other async frameworks' performance too. But then this benchmark also has other major issues, like using HTTP/1.0 without keep-alive in its Nginx proxy_pass config (keep-alive again has a huge effect on performance, and would be enabled on real performance-critical servers). https://sanic.readthedocs.io/en/latest/sanic/nginx.html
You're not completely off. There might be issues with async/await overhead that would be solved by a JIT, but also if you're using asyncio, the first _sensible_ choice to make would be to swap out the default event loop with one actually explicitly designed to be performant, such as uvloop's one, because asyncio.SelectorEventLoop is designed to be straightforward, not fast.
There's also the major issue of backpressure handling, but that's a whole other story, and not unique to Python.
My major issue with the post I replied to is that there are a bunch of confounding issues that make the comparison given meaningless.
If I/O-bound tasks are the problem, that would tend to indicate an issue with I/O event loop, not with Python and its async/await implementation. If the default asyncio.SelectorEventLoop is too slow for you, you can subclass asyncio.AbstractEventLoop and implement your own, such as buildiong one on top of uvloop. And somebody's already done that: https://github.com/MagicStack/uvloop
Moreover, even if there's _still_ a discrepancy, unless you're profiling things, the discussion is moot. This isn't to say that there aren't problems (there almost certainly are), but that you should get as close as possible to an apples-to-apples comparison first.
When I talk about async await I'm talking about everything that encompasses supporting that syntax. This includes the I/O event loop.
So really we're in agreement. You're talking about reimplementing python specific things to make it more performant, and that is exactly another way of saying that the problem is python specific.
No, we're not in agreement. You're confounding a bunch of independent things, and that is what I object to.
It's neither fair nor correct to mush together CPython's async/await implementation with the implementation of asyncio.SelectorEventLoop. They are two different things and entirely independent of one another.
Moreover, it's neither fair nor correct to compare asyncio.SelectorEventLoop with the event loop of node.js, because the former is written in pure Python (with performance only tangentally in mind) whereas the latter is written in C (libuv). That's why I pointed you to uvloop, which is an implementation of asyncio.AbstractEventLoop built on top of libuv. If you want to even start with a comparison, you need to eliminate that confounding variable.
Finally, the implementation matters. node.js uses a JIT, while CPython does not, giving them _much_ different performance characteristics. If you want to eliminate that confounding variable, you need to use a Python implementation with a JIT, such as PyPy.
Do those two things, and then you'll be able to do a fair comparison between Python and node.js.
Except the problem here is that those tests were bottlenecked by IO. Whether you're testing C++, pypy, libuv, or whatever it doesn't matter.
All that matters is the concurrency model because that application he's running is barely doing anything else except IO and anything outside of IO becomes negligible because after enough requests, those sync worker processes will all be spending the majority of their time blocked by an IO request.
The basic essence of the original claim is that sync is not necessarily better than async for all cases of high IO tasks. I bring up node as a counter example because that async model IS Faster for THIS same case. And bringing up node is 100% relevant because IO is the bottleneck, so it doesn't really matter how much faster node is executing as IO should be taking most of the time.
Clearly and logically the async concurrency model is better for these types of tasks so IF tests indicate otherwise for PYTHON then there's something up with python specifically.
You're right, we are in disagreement. I didn't realize you completely failed to understand what's going on and felt the need to do an apples to apples comparison when such a comparison is not Needed at all.
Nodejs is faster than Python as a general rule, anyway. As I understand, Nodejs compiles Javascript, Python interprets Python code.
I do a lot of Django and Nodejs and Django is great to sketch an app out, but I've noticed rewriting endpoints in Nodejs directly accessing postgres gets much better performance.
CPython, the reference implementation, interprets Python. PyPy interprets and JIT compiles Python, and more exotic things like Cython and Grumpy statically compiles Python (often through another, intermediate language like C or Go).
Node.js, using V8, interprets and JIT compiles JavaScript.
Although note that, while Node.js is fast relative to Python, it's still pretty slow. If you're writing web-stuff, I'd recommend Go instead for casually written, good performance.
The compare between Django against no-ORM is a bit weird given that rewriting your endpoint in python without Django or ORM would also have produced better results I suppose.
Right but this test focused on concurrent IO. The bottleneck is not the interpreter but the concurrency model. It doesn't matter if you coded it in C++, the JIT shouldn't even be a factor here because the bottleneck is IO and therefore ONLY the concurrency model should be a factor here. You should only see differences in speed based off of which model is used. All else is negligible.
So you have two implementations of async that are both bottlenecked by IO. One is implemented in node. The other in python.
The node implementation behaves as expected in accordance to theory meaning that for thousands of IO bound tasks it performs faster then a fixed number of sync worker threads (say 5 threads).
This makes sense right? Given thousands of IO bound tasks, eventually all 5 threads must be doing IO and therefore blocked on every task, while the single threaded async model is always context switching whenever it encounters an IO task so it is never blocked and it is always doing something...
Meanwhile the python async implementation doesn't perform in accordance to theory. 5 async workers is slower then 5 sync workers on IO bound tasks. 5 sync workers should eventually be entirely blocked by IO and the 5 async workers should never be blocked ever... Why is the python implementation slower? The answer is obvious:
It's python specific. It's python that is the problem.