Hacker News new | ask | show | jobs
by talideon 2204 days ago
If I/O-bound tasks are the problem, that would tend to indicate an issue with I/O event loop, not with Python and its async/await implementation. If the default asyncio.SelectorEventLoop is too slow for you, you can subclass asyncio.AbstractEventLoop and implement your own, such as buildiong one on top of uvloop. And somebody's already done that: https://github.com/MagicStack/uvloop

Moreover, even if there's _still_ a discrepancy, unless you're profiling things, the discussion is moot. This isn't to say that there aren't problems (there almost certainly are), but that you should get as close as possible to an apples-to-apples comparison first.

1 comments

When I talk about async await I'm talking about everything that encompasses supporting that syntax. This includes the I/O event loop.

So really we're in agreement. You're talking about reimplementing python specific things to make it more performant, and that is exactly another way of saying that the problem is python specific.

No, we're not in agreement. You're confounding a bunch of independent things, and that is what I object to.

It's neither fair nor correct to mush together CPython's async/await implementation with the implementation of asyncio.SelectorEventLoop. They are two different things and entirely independent of one another.

Moreover, it's neither fair nor correct to compare asyncio.SelectorEventLoop with the event loop of node.js, because the former is written in pure Python (with performance only tangentally in mind) whereas the latter is written in C (libuv). That's why I pointed you to uvloop, which is an implementation of asyncio.AbstractEventLoop built on top of libuv. If you want to even start with a comparison, you need to eliminate that confounding variable.

Finally, the implementation matters. node.js uses a JIT, while CPython does not, giving them _much_ different performance characteristics. If you want to eliminate that confounding variable, you need to use a Python implementation with a JIT, such as PyPy.

Do those two things, and then you'll be able to do a fair comparison between Python and node.js.

Except the problem here is that those tests were bottlenecked by IO. Whether you're testing C++, pypy, libuv, or whatever it doesn't matter.

All that matters is the concurrency model because that application he's running is barely doing anything else except IO and anything outside of IO becomes negligible because after enough requests, those sync worker processes will all be spending the majority of their time blocked by an IO request.

The basic essence of the original claim is that sync is not necessarily better than async for all cases of high IO tasks. I bring up node as a counter example because that async model IS Faster for THIS same case. And bringing up node is 100% relevant because IO is the bottleneck, so it doesn't really matter how much faster node is executing as IO should be taking most of the time.

Clearly and logically the async concurrency model is better for these types of tasks so IF tests indicate otherwise for PYTHON then there's something up with python specifically.

You're right, we are in disagreement. I didn't realize you completely failed to understand what's going on and felt the need to do an apples to apples comparison when such a comparison is not Needed at all.

No, I understand. I just think that your comparison with _node.js_ when there are a bunch of confounding variables is nonsense. Get rid of those and then we can look at why "nodejs will beat flask in this same exact benchmark".
> I just think that your comparison with _node.js_ when there are a bunch of confounding variables is nonsense

And I'm saying all those confounding variables you're talking about are negligible and irrelevant.

Why? Because the benchmark test in the article is a test where every single task is 99% bound by IO.

What each task does is make a database call AND NOTHING ELSE. Therefore you can safely say that for either python or Node request less than 1% of a single task will be spent on processing while 99% of the task is spent on IO.

You're talking about scales on the order of 0.01% vs. 0.0001%. Sure maybe node is 100x faster, but it's STILL NEGLIGIBLE compared to IO.

It it _NOT_ Nonsense.

You Do not need an apples to apples comparison to come to the conclusion that the problem is Specific to the python implementation. There ARE NO confounding variables.

> And I'm saying all those confounding variables you're talking about are negligible and irrelevant.

No, you're asserting something without actual evidence, and the article itself doesn't actually state that either: it contains no breakdown of where the time is spent. You're assuming the issue lies in one place (Python's async/await implementation) when there are a bunch of possible contributing factors _which have not been ruled out_.

Unless you've actually profiled the thing and shown where the time is used, all your assertions are nonsense.

Show me actual numbers. Prove there are no confounding variables. You made an assertion that demands evidence and provided none.