Not even just the semantics, the performance is awful. Even when the fork is virtual (as any modern fork is) and there's no memory copying because it's COW, all the kernel page tables still need to be copied and for a multi-GB process that's nontrivial. That's why any sane large service that needs to fork anything will early on start up a slave subprocess whose only job is to fork quickly when the master process needs it.
>all the kernel page tables still need to be copied and for a multi-GB process that's nontrivial
Only in the pathological case where the large process is backed solely by the 4kb pages. The hardware has long now supported large pages - on x86 since Pentium Pro, if memory serves - and huge pages. The popular OSes (Linux 2.6+ and Windows 2003+) also do support large and huge pages.
A 2GB process can easily be three pages: r/x code, r/w stack, r/w data (2gb). Granted, it gets a bit more complex if mmapped I/O or JIT are used, but since both are mature technology now, it's fine to point fingers at any inefficiency and demand better. Another caveat would probably be shared libraries loading at separate address ranges, which, IMO, is another reason to ditch shared libraries for good.
Contrary to popular wisdom, OS research is still relevant.
You want to ditch shared libraries and mmap to map your big processes using GB pages to make fork fast again (despite it not being the main and only drawback)???
OS research might be relevant, and it's good that some people have wild idea, but honestly I doubt this one will go anywhere :P
About shared libraries, I know that there is this line of thought considering them "evil" (well at least sufficiently to want to get rid of them); but I'm quite unsure about what a modern system would look like without them (although this is less a problem at the application level on e.g. Android, the system level is still extremely important)
With Spectre, proper process bounds (well, address spaces) are more important than ever -- and oh well even without that I'd still have cited them as incredibly important, in the sense that I'd rather have more than fewer. Given that, code reuse involves shared libraries, for several good reasons; the obvious one being not wasting RAM, but then there is the update problem (how to patch programs when security holes are discovered, especially if multiple parties are involved), and on top of that there is the cache pollution problem, which is related to the code duplication problem, and which is quite insidious because it is probably simultaneously hard to benchmark and very real (ambient loss of perf, just not in very hot paths, but this will still have an impact on the general perf of a system, quite like Spectre mitigations are having a big impact)
Now we could like address space boundaries so much that we would want to just use even MORE processes in place of shared libraries, but this obviously does not work for all services (and Spectre is biting us again because context switches are not cheap), plus if you take it to the extreme this makes systems extremely hard to design, and even bigger. This is part of the reasons we are using Linux instead of Hurd... (well Linux is too much in the opposite direction, but there are hopes that it will in the long term evolve toward a middle ground)
And anyway that does not fit the narrative at all of using more huge pages.
Now there are the usual radical ideas about how everything should be running on some kind of VM (sometimes even including the kernel), drastically reducing the amount of "native" code; but given the reality of our current systems that "everything" both relies on multiple VMs and I doubt it will tend to only one, nor should it (because of the monoculture this would induce). Plus the ambient perfs are still lower than native code, and TBH I don't expect that to change ever.
So, why and how would you like to get rid of shared libraries?
We are using Linux instead of Hurd due to manpower.
Most high integrity real time OSes are microkernels.
Interesting that you mention Android, one of the key points of Project Treble is using separate processes for drivers with Android IPC to talk to the kernel (including hardware buffer handles).
> Contrary to popular wisdom, OS research is still relevant.
Is it really popular wisdom though, or is it the opinion of one person and it got hyped up, much like the same hype happened on a subpar programming language that same person worked on?
> That's why any sane large service that needs to fork anything will early on start up a slave subprocess whose only job is to fork quickly when the master process needs it.
I don't think that's (entirely) true. This is more because a large service with some potent master process will have said process Do Stuff(tm) that will involve opening files, threads, signal handling, or whatever things that need to be taken care of one way or the other when forking to a worker (or whatever other child) process. It's therefore much simpler to fork a master subprocess into a child spawner earlier on, when it has yet to do anything. You significantly reduce your chances of screwing up if you have nothing to clean up for.
That's true, it's not the only reason. Dealing with threads and buffers and pthread_atfork and the associated heartbreak is a biggie also. But the performance is nothing to laugh at.
I just did a quick test, a 100mb process generally takes >2ms to fork, while a 1mb or less process takes 70us. It seems like its pretty much linear with process size.
The performance is awful, but in return you get the COW memory you mentioned. That's a pretty huge benefit for a lot of programs with huge, seldom-changing memory state at startup. If those programs want to parallelize themselves without duplicating that memory or paying startup time/CPU overhead, fork() is a pretty handy way to achieve that.
These days, you can usually start a process without forking through posix_spawn/vfork. Although, I gather some servers still do it so they can set the current working directory more easily.
* redirect stdin, stdout, and stderr
* open files that might be needed and close files that aren't
* change process limits
* drop privileges
* change the root directory
* change namespaces
And there are a few other things I am probably forgetting.
I'm not sure I agree it's the ideal way to do it. That's a heck of a lot of work for one function to do, and it necessarily duplicates the functionality of a ton of other functions. And that's ignoring the fact that forking without ever exec'ing can be really useful in many cases.
I haven't yet read the paper, but considering the incredible simplicity from the programmer's PoV that fork provides, and the fact that at least Linux makes it pretty god damn fast, especially compared to Windows' non-forking model, I can't really see myself agreeing with their conclusion.
When you read the paper, you'll see this covered in section 6 ("REPLACING FORK") subsection "Low-level: Cross-process operations"
> While a spawn-like API is preferred for most instances of starting a program, for full generality it requires a flag, parameter, or new helper function controlling every possible aspect of process state. It is infeasible for a single OS API to give complete control over the initial state of a new process. ...
> clean-slate designs [e.g., 40, 43] have demonstrated an alternative model where system calls that modify per-process state are not constrained to merely the current process, but rather can manipulate any process to which the caller has access ...
> Retrofitting cross-process APIs into Unix seems at first glance challenging, but may also be productive for future research.