Hacker News new | ask | show | jobs
by chapium 2316 days ago
This is clearly not my subject area. Why would we be spawning processes for HTTP requests? This sounds awful for performance.

My best guess is a security guarantee.

4 comments

Not spawning, forking. Web servers were simple “accept(2) then fork(2)” loops for a long time. This is, for example, how inetd(8) works. Later, servers like Apache were optimized to “prefork” (i.e. to maintain a set of idle processes waiting for work, that would exit after a single request.)

Long-running worker threads came a long time later, and were indeed intensely criticized from a security perspective at the time, given that they’d be one use-after-free away from exposing a previous user’s password to a new user. (FCGI/WSGI was criticized for the same reason, as compared to the “clean” fork+exec subprocess model of CGI.)

Note that in the context of longer-running connection-oriented protocols, servers are still built in the “accept(2) then fork(2)” model. Postgres forks a process for each connection, for example.

One lesser-thought-about benefit of the forking model, is that it allows the OS to “see” requests; and so to apply CPU/memory/IO quotas to them, that don’t leak over onto undue impacts on successive requests against the same worker. Also, the OOM killer will just kill a request, not the whole server.

Thanks for that last paragraph, I'd never thought about that aspect of processes. Learned something new today.
PHP these days doesn't fork and spawn a new process, though it does create a new interpreter context.

In the old cgi-bin days, every web request would fork and exec a new script, whether PHP, Perl, C program, etc. That was replaced with Apache modules (or nsapi, etc), then later, with long running process pooled frameworks like fcgi, php-fpm, etc. Perl and PHP typically then didn't fork for every request. But did create a fresh interpreter context to be backward compatible, avoid memory leaks, etc. So there's still overhead, but not as heavy as fork/exec.

The (web) world used to be synchronous. Traditional Apache spawns a number of threads and then keeps each thread around for x number or requests, after which the thread is killed and a new one spawned. Incredibly useful feature when you're on limited hardware and want to ensure you don't memory leak yourself out of existence. Modern Apache has newer options (and of course nginx has traditionally been entirely async on multiple threads).
Killing a process is much safer than killing a thread, and the OS does cleanup.

It's not great for maximizing performance but it's not 100s of milliseconds either, forking doesn't take long; what is slow is scripting languages loading their runtimes, but you can fork after that's loaded. If hardware is cheaper than opportunity cost of adding new features (rather than debugging leaks) it makes sense.

I measured less than half a millisecond to fork, print time, and wait for child to exit.

http://paste.dy.fi/NEs/plain

So forking alone doesn't cap performance too much; one or two cores could handle >1000 requests per second (billions per month).