| There is no sensible OS API that could support this, because fundamentally memory access is a hardware API. The OS isn’t involved in normal memory reads, because that would be ludicrously inefficient, effectively requiring a syscall for every memory operation, which effectively means a syscall for any operation involving data I.e. all operations. Memory operations are always synchronous because they’re performed directly as a consequence of CPU instructions. Reading memory that’s been paged out results in the CPU itself detecting that the virtual address isn’t in RAM, and performing a hardware level interrupt. Literally abandoning a CPU instruction mid execution to start executing an entirely separate set of instructions which will hopefully sort out the page fault that just occurred, then kindly ask the CPU to go back and repeat the operation that caused the page fault. OS is only involved only because it’s the thing that provided the handling instructions for the CPU to execute in the event of a page fault. But it’s not in anyway actually capable of changing how the CPU initially handles the page fault. Also the current model does allow other threads to continue executing other work while the page fault is handled. The fault is completely localised to individual thread that triggered the fault. The CPU has no concept of the idea that multiple threads running on different physical core are in anyway related to each other. It also wouldn’t make sense to allow the interrupted thread to someone kick off a separate asynchronous operation, because where is it going to execute? The CPU core where the page fault happened is needed to handle the actual page fault, and copy in the needed memory. So even if you could kick off an async operation, there wouldn’t be any available CPU cycles to carry out the operation. Fundamentally there aren’t any sensible ways to improve on this problem, because the problem only exists due to us pretending that our machines have vastly more memory than they actually do. Which comes with tradeoffs, such as having to pause the CPU and steal CPU time to maintain the illusion. If people don’t like those tradeoffs, there’s a very simple solution. Put enough memory in your machine to keep your entire working set in memory all the time. Then page faults can never happen. |
Not only is there a sensible OS API that could support this, Linux already implements it; it's the SIGSEGV signal. The default way to respond to a SIGSEGV is by exiting the process with an error, but Linux provides the signal handler with enough information to do something sensible with it. For example, it could map a page into the page frame that was requested, enqueue an asynchronous I/O to fill it, put the current green thread to sleep until the I/O completes, and context-switch to a different green thread.
Invoking a signal handler only has about the same inherent overhead as a system call. But then the signal handler needs another couple of system calls. So on Linux this is over a microsecond in all. That's probably acceptable, but it's slower than just calling pread() and having the kernel switch threads.
Some garbage-collected runtimes do use SIGSEGV handlers on Linux, but I don't know of anything using this technique for user-level virtual memory. It's not a very popular technique in part because, like inotify and epoll, it's nonportable; POSIX doesn't specify that the signal handler gets the arguments it would need, so running on other operating systems requires extra work.
im3w1l also mentions userfaultfd, which is a different nonportable Linux-only interface that can solve the same thing but is, I think, more efficient.