Hacker News new | ask | show | jobs
by gchpaco 4983 days ago
From hard learned professional experience: EventMachine isn't written very well, when I say that I mean specific things like "whoever wrote the sub process handling in EventMachine wrote code that is uniquely wrong on every platform I've ever heard of." If you are not familiar with this (and are curious, in some sort of macabre way), I urge you do grep for SIGCHLD or wait in the source and see what you find.

What EventMachine does (or at least what it did a year or so ago when I was debugging this) is this: sub processes are equated to popen. When the input side of that pipe closes (that is, when the sub process closes STDOUT) the process will be finally be waited upon—and if it doesn't terminate in a hard coded timeout which by the way blocks the rest of your program, then it will be forcibly gunned down, with SIGKILL if necessary.

Among the problems that arise here, note that unless your daemon script is written to unconditionally drop STDOUT upon forking (uncommon) and you attempt to launch a daemon from within a sub process you are managing using EventMachine, the subprocess itself will terminate quickly, the daemon will go on its merry way, and your driver program will never, ever tell you it has finished running until that daemon is dead and anything it has spawned that might possibly use STDOUT is also dead. And god forbid it close that stream and then dare to continue running, for EventMachine will shoot it dead within IIRC 20 seconds, and lock up your driver program for the duration to boot.

Programmers who understand how the Unix process model works will write a very small signal handler for SIGCHLD that writes a byte on a pipe or some similar method of notifying the main event loop and call wait on the child immediately and then close its end of those pipes. I am reliably informed by those who understand the Windows process model that what EventMachine does is even more wrong there. This is a subsystem that was not written by anyone who knew what popen does, could not be bothered (or was perhaps incompetent to read) what any of a dozen standard implementations of it do, and appears to have debugged the code into some form of submission and then released it upon an unsuspecting public.

This is the only colossally wrong decision they made that I can list off the top of my head, but that's because it was so stupid I stopped looking for trouble after that. EventMachine does not handle anything but a very straightforward select loop very well, and I am sufficiently terrified of what lives under the covers in that system that I would rather write the select by hand (massive pain though it may be) than let this system anywhere near it. The thing that really alarms me is that people build walls of cardboard like NeverBlock (which reaches deeply into the guts of the Ruby software I/O and replaces it with EventMachine driven coroutines) atop this foundation of sand and then wonder when it falls over sideways in an impenetrable and impossible to debug fashion.

Coroutine programming (for that is, essentially, what we are talking about) can be a very elegant way to solve certain problems, but it works best when it is simple, or it least localized (e.g. samefringe). In an event driven server, every little piece must be audited carefully to ensure that it does not block. You get all the same problems any preemptive concurrency model does, with some added nastiness; in exchange you get some slightly better scalability numbers. It is at its best in a fairly simple program such as, say, nginx in its proxy configuration, where it speaks streams and SSL and talks to some application server on the other end of a different stream for anything sophisticated.

1 comments

Nobody uses EventMachine to manage daemon processes. Subprocesses in EventMachine aren't "equated to popen"; they exist for the sole purpose of doing evented I/O popen-style. I'd be careful about calling a developer "incompetent" because they write something that doesn't admit to arbitrary use cases.
I'm being elliptic about why here because I don't think I can talk about the internal architecture of that system in public, but warning people off one particularly stupid third party bug that we fixed in our internal fork is not, I believe, a problem. Anyway, we certainly did use it to manage daemon processes, although not deliberately; we had a daemon that communicated with external software about system events, and running shell scripts was part of that. We didn't necessarily anticipate folks running 'service httpd start' in those shell scripts, but it was not an inherently unreasonable thing to do.

And this isn't "arbitrary use cases"; this is an explicitly supported function that is completely contrary to good practice and sane behavior and, to boot, has the ability to arbitrarily kill programs for impenetrable reasons and block for significant periods of time (the central sin of event driver programming). You can't tell me that if you saw something like this in a random crypto library you wouldn't immediately tell everybody to stop using it; why should EM's developers get a pass for their, yes, incompetently written popen? I would actually be considerably happier if it wasn't in the library at all; at least then it wouldn't be wrong.

Do you have other examples of how badly constructed EventMachine is, or is it just that you can't use their process I/O stuff as a daemon manager?

I was using Adam Langley's net/ssl code in Golang to build an HTTPS proxy, and only after several hours of hair-pulling did I discover that Langley hadn't implemented the compat SSL2 handshake that Firefox uses with proxies. net/ssl in Go was, for no good reason other than an omission, unsuited for use as an HTTPS proxy. Should I say net/ssl was incompetently written? That seems like a bad idea to me.