Hacker News new | ask | show | jobs
by 0x000xca0xfe 344 days ago
I guess multiprocessing got a bad reputation because it used to be slow and simple so it got looked down upon as a primitive tool for less capable developers.

But the world has changed. Modern systems are excellent for multiprocessing, CPUs are fast, cores are plentiful and memory bandwidth just continues getting better and better. Single thread performance has stalled.

It really is time to reconsider the old mantras. Setting up highly complicated containerized environments to manage a fleet of anemic VMs because NodeJS' single threaded event loop chokes on real traffic is not the future.

2 comments

That really has nothing to do with the choice to use CGI. You can just as well use rust with Axum or Actix and get a fully threaded web server without having to fork for every request.
Absolutely, I'm not recommending for everybody to go back using CGI (the protocol). I was responding to this:

> The CGI model may still work fine, but it is an outdated execution model

The CGI model of one process per request is excellent for modern hardware and really should not be scoffed at anymore IMO.

It can both utilize big machines, scale to zero, is almost leak-proof as the OS cleans up all used memory and file descriptors, is language-independent, dead simple to understand, allows for finer granularity resource control (max mem, file descriptor count, chroot) than threads, ...

How is this execution model "outdated"?

The part of the execution model that is dated is this:

> having the web server execute stuff in a specific folder inside the document root just seems like a recipe for problems

Typically I've run cgi from a directory outside the document root. That's easy, and I think was the defaults?

That said, fork+exec isn't the best for throughput. Especially if the httpd doesn't isolate forking into a separate, barebones, child process, fork+exec involves a lot of kernel work.

FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.

> FastCGI or some other method to avoid forking for each request is valuable regardless of runtime. If you have a runtime with high startup costs, even more so.

What's the point of using FastCGI compared to a plain http server then? If you are going to have a persistent server running why not just use the protocol you are already using the semantics of?

I don't generally want or need my application server to serve static files, but I may want to serve them on the same hostname (or maybe I don't).

There's potential benefits for the httpd to manage specifics of client connections as well: If I'm using a single threaded process per request execution model, keep-alive connections really ruin that. Similarly with client transfer-encoding requests, does my application server need to know about that. Does my application server need to understand http/2 or http/3?

You could certainly do a reverse proxy and use HTTP instead of FastCGI as the protocol between the client facing httpd and the application server... although then you miss out on some speciality things like X-Sendfile to accelerate sending of files from the application server without actually transferring them through sockets to the httpd. You could add that to an http proxy too, I suppose.

Yep, that is definitely problematic. But it also allowed a sprawling ecosystem of tons of small applications that people could just download and put on their website via FTP and do the configuration in the browser afterwards.

This is easy enough for non-technical people or school kids and still how it works for many Wordpress sites.

The modern way of deploying things is safer but the extra complexity has pushed many, many folks to just put their stuff on Facebook/Instagram instead of leveling up their devops skills.

Somehow we need to get the simplicity back, I think. Preferably without all the exploits.

What kind of problems? Like, if the administrator put something inside that directory (Unix doesn't have folders) that the web server shouldn't execute? That kind of problems? I've literally never had that problem in my life and I've had web pages for 30 years.
> Like, if the administrator put something inside that directory

Path traversal bugs allowing written files to land in the cgi-bin used to be a huge exploit vector. Interestingly, some software actually relied on being able to write executable files into the document root, so the simple answer of making the permissions more limited is actually not a silver bullet.

If you've never seen or heard of this, ¯\_(ツ)_/¯

> Unix doesn't have folders

Great and very important point. Someone should go fix all of these bugs:

https://github.com/search?q=repo%3Atorvalds%2Flinux%20folder...

I've certainly heard of that problem, but I've never experienced it, because it's easy to avoid. At least, it's easy if you're not running certain pieces of software. I'd suggest not using Wordpress (or, ideally, PHP) and disabling ExecCGI in whatever directories you need to host untrusted executables in.

Of course, disabling ExecCGI in one directory won't help if you do have path traversal holes in your upload-handling code. I'm not convinced that disabling CGI will help if attackers can use a path traversal hole to upload malicious executables to arbitrary paths you can write to. They can overwrite your .bashrc or your FastCGI backend program or whatever you're likely to execute. CGI seems like the wrong thing to blame for that.

Why are you linking me to a "Sign in to search code on GitHub" page?

I feel it necessary to clarify that I am not suggesting we should use single-threaded servers. My go-to approach for one-offs is Go HTTP servers and reverse proxying. This will do quite well to utilize multiple CPU cores, although admittedly Go is still far from optimal.

Still, even when people run single-thread event loop servers, you can run an instance per CPU core; I recall this being common for WSGI/Python.