It is really unfortunate that Linux does not do proper async disk IO. Then again, for lots of websites, the static assets stores on disk fit in the OS cache so the boost won't really be nearly as big.
I'm not sure what you really mean by this. Linux has supported non-blocking I/O using select and poll since at least 2.4. 2.6 even added support for epoll, which scales even further since the callbacks are O(1).
It's a fairly common practice to spawn 2n processes/threads (n processors) to allow half to block on I/O and system calls though.
Well, TFA talks about Linux not having great support for async IO for the filesystem. You can use O_DIRECT and get async IO that way, but that completely bypasses the OS cache, so it's not a great way to do it, at least not for nginx. Just read the article to see the details.
Note that kqueue(2) in BSD-land supports a unified interface for async IO for both sockets and files, so you can have a proper event loop without having to resort to reading files in a thread pool. If Linux had something similar, a nginx wouldn't need to integrate a threadpool for this (though it might for other things, such as CPU-intensive plugins).
The nginx threadpools aren't strictly for I/O. One of the other major issues TFA mentions is that plugins don't use epoll/kqueue, and they block (with all the associated performance costs).
The detail I apparently skipped is that uncached file reads aren't handled uniformly through epoll (which I'm surprised about). I don't see why files should be handled any differently than sockets. etc... in regard to non-blocking I/O using epoll.
Although, my issue is that everyone tends to look to methods starting with aio_ to do asynchronous I/O. Those are fairly bad interfaces (POSIX AIO) and inefficient (effectively threadpools). Using the nginx model with an epoll/kqueue event loop is a better architecture.
It's a fairly common practice to spawn 2n processes/threads (n processors) to allow half to block on I/O and system calls though.