Hacker News new | ask | show | jobs
by hosay123 4835 days ago
Sane is in the eye of the beholder.. gevent looks nice, but I'd be very diffident when it comes to actually supporting it in production. It monkey patches the standard library and messes with CPython internals to achieve what it does, infinitely increasing the chance it will conflict with some other piece of code (for example, that bizarre ancient internal propriety library you're using that started life in Fortran, etc)

In the case of async I'm glad to see a from scratch implementation for the standard library. It's a weird area that necessitates some constructions that there is no really standard Python style for. You only need to look at Twisted and e.g. its method chaining to realize this stuff would need a thorough sanity rework before it ever became standard anyway.

Also, most other implementations take the approach of building their own little world. This is definitely true of Twisted. You write code for Twisted, not for Python. Gevent at least doesn't suffer from this.

6 comments

As someone who's contributed to Eventlet, I've always felt that the only way that it could get over this scary hurdle (and it is legitimately scary) is for it to be integrated within Python itself. Almost all of the weird problems come from fighting with the baked-in assumptions of the Python runtime. Eventlet does try to alleviate the scariness a little bit by allowing you to import "greened" modules rather than changing the global versions, but that has its own problems.

If it were integrated with Python, there would be no monkeypatching, no special magic, it would be just how things work. That said, I'm not at all surprised that Guido doesn't favor a coroutine-based solution; his opposition to general coroutines is as famous as his opposition to anonymous functions. (to clarify: I don't think the @coroutine decorator creates "real" coroutines, any more than generators were already coroutines)

You da man, rdw.
I used gevent at first for a project that needed Async I/O and it worked really well, but then I switched to Erlang and I realized how poor a choice Python is for such tasks. The language really needs to be designed from the start for it (like Go, Rust, Erlang &c... Haskell wasn't designed from start for it but because of it's functional purity, bolting it on was "natural" but - it isn't so for Python IMHO).
Yeah, if you're going to have to do async to be performant than it better be pretty pervasive throughout all the libraries. Bonus points if the language supports syntax to make async easier as well. Node is beating out Python for server stuff not simply because it is "async", but because it is so much FASTER. The speed of V8 vs. CPython is a big part of that. In fact, vanilla JS doesn't have much to make async programming particularly easy: it has verbose function declaration and no yield mechanisms. Even library-level solutions like promises are merely 'OK'.

Still, it is easier to build a fast server that can handle streams in Node than it is in Python. Async Python? I'll just stick to async JS in that case.

I think there is another issue here. Python world has watched as Node.js has been eating its lunch on the server side and they decided, ah, surely that is because Node.js has async, if we add that too, everyone will love Python again and come crawling back. They are not saying, I think that is written between the lines.

Except one thing, as you pointed out, people use Node.js -- 1) it is JS 2) V8 is fast.

The reason why you getting so many downvotes is because your comment is not just silly but flat wrong. Twisted, Tornado were around before nodejs. There are also async frameworks that make writing async code the same as synchronous code. I like tornado but I am using nodejs for my current app because of the libraries. This is where nodejs really shines, the community and libraries are awesome. Twisted has a lot of libraries but it has so much going on that many developers find it too complex. Tornado is a much simpler async framework to adapt to and allows you to run twisted libraries.

Twisted's inlinecallback's and tornado's gen module gets rid of all the async spaghetti code. This is hard to do with nodejs but I still chose nodejs because the available libraries made my project quicker to develop.

Sorry I didn't express myself correctly (see my reply to akuchling below).

Basically yes, Python had Twisted for years, it had Diesel, Monocle, Tornado, and some other ones. I am aware of those and as you've read my comment you saw that I used Twisted enough to know its ins and outs (5 years).

> There are also async frameworks that make writing async code the same as synchronous code.

Yes there is inlineCallbacks and I used. Node.js also has async (https://github.com/caolan/async). But you don't address the main problem that I raised -- fragmentation of libraries. Python is great because it comes with batteries, and then you can find even more batteries everywhere, _except_ if you use an async framework like Twisted, which, percolates all the way through you API code. Once your socket.recv() returns a Deferred(), that deferred will bubble up all the way to the user interface. So you now you end up searching or recreating a parallel set of libraries.

> Twisted has a lot of libraries but it has so much going on that many developers find it too complex.

It is too complex with too many libraries for those who want to take it up but it is not complex and doesn't have enough libraries if you are in it already -- every library you use has to be Twisted now. That's the danger of inventing a new framework.

Yes it will be standard, but there is already a practical standard -- eventlet and gevent. This is somethin Node.js doesn't have. I will personally take monkey-patching and the danger that my function will context switch inside while doing IO over using Twisted. I saw a practical benefit from it at least.

I have a question for you since you have a lot of experience with async. Eventually node.js will have generators (when V8 implements ECMAScript 6) which should allow node.js to have something like gevent. What kind of effect do you think this will have on the node.js world?
People have been writing async apps in Python since 1995 (Medusa); Twisted was first published around 2001. It's not like async programming is new to Python.
Sorry I didn't say it correctly, I meant that it seems the renewed interest in async is stemming from watching Node.js get all the attention.

I have been using Twisted for 5 years full time and have also used eventlet and gevent. From talking to others, I have found few who enjoyed or loved Twisted. It was pretty much the only sane way to do concurrent, performant IO for a while. But then when green thread approach came about, I had never looked back.

All was well. Then one day Node.js showed up, and it seems it has started to eat Python's lunch -- fast, scripted development on the server side, with some reasonable concurrency. And it was faster too.

Python devs looked at it and couldn't believe their eyes. And I speculate many have concluded it was because everyone was in love with a callback based async IO paradigm. So that's my guess why we are seeing this proposal.

Go was designed for concurrency from the start and even has null pointers and lots of mutation, this makes the concurrency problem easier to solve because you have more control over state.
Meh, I'm not sure where I stand on this argument - I've recently been having an affair with Haskell and I prefer the Haskell Way. The way Haskell models I/O seems confusing at first but monads make the whole thing more manageable than any language I've ever used.

"More control over state" makes me feel funny inside; if you haven't actually used Haskell then you would probably see that pretty much every language right now is immature in comparison to Haskell when it comes to "control over state".

I also think mutability makes reasoning about large concurrent and/or parallel programs much more difficult.

Agreed, I am learning Erlang currently as well.
My company used gevent in production at very large scale, and we were extremely happy with it. In fact, we ported our existing Django and Flask applications to run under gevent, which was a surprisingly fast process. (Weeks, not months, to port rather large codebases.) We did have to be careful with third-party libraries, like Zookeeper clients, but that was worth the tradeoff. We got the performance of an evented structure without having to rewrite a ton of code.
How does one go about porting Django apps to be compatible with gevent? In that you used gevent in your Django code, or that you built something completely different?
I assume that you use a WSGI server which allocates one greenlet per request (e.g. gevent.wsgi, or Gunicorn's async workers), and make sure that the rest of your code isn't going to block the event loop too much. Once that's done, you can have a whole bunch of HTTP requests being handled at once. That's nice if your server spends most of its time blocking on database requests or something.
Have you actually got any war stories about gevent, or is this all guesswork about how you think it stands to reason that it would be bad?

Do you think this would be relevant if greenlets were adopted as a part of Python?

I have -- we used Twisted (about 10k lines of code). Then switched to eventlet -- and the code reduced to about 5 or 6K. There were some issues with monkey-patching, but tests showed them, and we fixed them.

As someone mentioned, if they instead standardized on greenlet then monkeypatching talk wouldn't make sense.

If your bizarre ancient internal thing doesn't need to do async I/O, then don't import it monkey-patched. I use Eventlet heavily in production, and this tends to be pretty easy.
Fortran libraries; you mean Scipy? A bunch of the Python numerics world (one of the big growth cases for Python!) are Fortran. LAPACK/ARPACK are still unmatched.