Hacker News new | ask | show | jobs
by jerf 4692 days ago
It appears to add some amount of Python-bytecode level preemption to gevent, which allows you to hopefully avoid some of the pathological cases of cooperative scheduling. Said pathological cases are only a matter of scale... if your program becomes large enough, you will hit them, eventually.

That said, with no offense intended to mirman, I'd really hesitate before using this for anything serious enough to reach that scale in the first place. Gevent, frankly, visibly pushes Python to the limits (and occasionally a bit beyond), trying to also tack on some preemption on an environment not fundamentally expecting it would scare me another notch.

2 comments

No offense taken - both of these are obviously visibly delicate.

There is a version in the history that used Greenlet instead of gevent which was potentially a bit less delicate, but it required wrapping of the main file and didn't work with time.sleep, and I didn't feel like it was worth writing my own locks, semaphores, mutexes, pipes and whatnot.

what are the pathological cases one runs into with gevent?
With a cooperative scheduler, you will, sooner or later, experience some form of starvation. The most obvious is the process that just infinitely loops, but less obvious situations will end up popping up too; calls that you thought were handled by the event loop but turn out to be blocking and add up when you start calling them at scale, strange behavior when you have a set of processes that turn out to yield far less often than you thought and the system starts behaving with much higher latency than it should if you get too many of them, and all kinds of such manifestations. There's also the inability to create things that watch other things; if something does go spinning off into infinity, nothing else gets to run to kill it.

You can hack around many of them, but you eventually hit a wall, and the effort of the hack increases rapidly.

Note I didn't really use gevent in this reply, this is just about cooperative scheduling. There's a reason why we've all but completely abandoned it at the OS level for things we'd call "computers" (as opposed to "embedded systems" or "microcontrollers", etc). I tend to consider cooperative scheduling an enormous red flag in any system that uses it... and yes, that completely and fully includes the currently-popular runtimes that use it.

It sounds like this sort of problem comes from using a cooperative scheduler to implement concurrency of arbitrary routines rather than control flow. I haven't been in a situation in which it would even be possible for something to yield less often than I expect, because I expect it to run until it yields. Similarly I don't often find that subroutines return too infrequently because I expect them to run until they return.

This library is probably nice for the places I would otherwise use threads.

You will eventually, at scale, be wrong about that. To have full and correct knowledge of exactly how long your code takes to run sufficient to do this sort of scheduling correctly, by hand, in advance of running it, is basically equivalent to claiming that you never need to profile code because you already know exactly how long it takes. And it is well known and established to my satisfaction that even absolute, total experts in a field will still often be surprised about what actually comes out of a profiler, even in code strictly in their domains. You may well be right most of the time... but that is all you can hope for.
If it takes more than 16.67ms to run a frame's worth of update-and-draw, then it does, and replacing "wake up every in-game entity that asked to wake up this frame" with "let a preemptive scheduler manage ~10,000 threads that want to wake up, do almost nothing, and then sleep for k frames, while some master thread waits on a latch until they're all done" seems unlikely to make it any faster. If the logic my server must perform to handle a request is expensive, then it is, and replacing an event loop with a single-threaded preemptive scheduler will not increase throughput.

I'm not sure why it is difficult to do this sort of thing correctly. The scheduler does next to nothing in the "server with connections managed in coroutines" case and probably makes matters worse in the "storing game state in execution states" case. It could have a positive impact in the server application if one routine is secretly going to crash or run forever, in the sense that the other routines will continue running while the problematic feature is fenced off or fixed.