Mmm... would beg to differ. I have ported stuff to NOMMU Linux and almost everything worked just as on a "real" Linux. Threads, processes (except only vfork, no fork), networking, priorities, you no name it. DOS gives you almost nothing. It has files.
The one thing different to a regular Linux was that a crash of a program was not "drop into debugger" but "device reboots or halts". That part I don't miss at all.
This was interesting. It reminded me how fork() is so weird and I found some explanation for its weirdness that loops back to this conversation about nommu:
"Originally, fork() didn't do copy on write. Since this made fork() expensive, and fork() was often used to spawn new processes (so often was immediately followed by exec()), an optimized version of fork() appeared: vfork() which shared the memory between parent and child. In those implementations of vfork() the parent would be suspended until the child exec()'ed or _exit()'ed, thus relinquishing the parent's memory. Later, fork() was optimized to do copy on write, making copies of memory pages only when they started differing between parent and child. vfork() later saw renewed interest in ports to !MMU systems (e.g: if you have an ADSL router, it probably runs Linux on a !MMU MIPS CPU), which couldn't do the COW optimization, and moreover could not support fork()'ed processes efficiently.
Other source of inefficiencies in fork() is that it initially duplicates the address space (and page tables) of the parent, which may make running short programs from huge programs relatively slow, or may make the OS deny a fork() thinking there may not be enough memory for it (to workaround this one, you could increase your swap space, or change your OS's memory overcommit settings). As an anecdote, Java 7 uses vfork()/posix_spawn() to avoid these problems.
On the other hand, fork() makes creating several instances of a same process very efficient: e.g: a web server may have several identical processes serving different clients. Other platforms favour threads, because the cost of spawning a different process is much bigger than the cost of duplicating the current process, which can be just a little bigger than that of spawning a new thread. Which is unfortunate, since shared-everything threads are a magnet for errors."
That's fair. If so, then you still can have things like drivers and HAL and so on too. However, there's no hard security barriers.
How do multiple processes actually work, though? Is every executable position-independent? Does the kernel provide the base address(es) in register(s) as part of vfork? Do process heaps have to be constrained so they don't get interleaved?
There are many options. Executables can be position-independent, or relocated at run-time, or the device can have an MPU or equivalent registers (for example 8086/80286 segment registers), which is related to an MMU but much simpler.
Executables in a no-MMU environment can also share the same code/read-only segments between many processees, the same way shared libraries can, to save memory and, if run-time relocation is used, to reduce that.
The original design of UNIX ran on machines without an MMU, and they had fork(). Andrew Tanenbaum's classic book which comes with Minix for teaching OS design explains how to fork() without an MMU, as Minix runs on machines without one.
For spawning processes, vfork()+execve() and posix_spawn() are much faster than fork()+execve() from a large process in no-MMU environments though, and almost everything runs fine with vfork() instead of fork(), or threads. So no-MMU Linux provides only vfork(), clone() and pthread_create(), not fork().
Thanks! I was able to find some additional info on no-MMU Linux [1], [2], [3]. It seems position-independent executables are the norm on regular (MMU) Linux now anyway (and probably have been for a long time). I took a look under the covers of uClibc and it seems like malloc just delegates most of its work to mmap, at least for the malloc-simple implementation [4]. That implies to me that different processes' heaps can be interleaved (without overlapping), but the kernel manages the allocations.
Under uClinux, executables can be position independent or not. They can run from flash or RAM. They can be compressed (if they run in RAM). Shared libraries are supported on some platforms. All in all it's a really good environment and the vfork() limitation generally isn't too bad.
I spent close to ten years working closely with uClinux (a long time ago). I implemented the shared library support for the m68k. Last I looked, gcc still included my additions for this. This allowed execute in place for both executables and shared libraries -- a real space saver. Another guy on the team managed to squeeze the Linux kernel, a reasonable user space and a full IP/SEC implementation into a unit with 1Mb of flash and 4Mb of RAM which was pretty amazing at the time (we didn't think it was even possible). Better still, from power on to login prompt was well under two seconds.
> The original design of UNIX ran on machines without an MMU, and they had fork().
The original UNIX also did not have the virtual memory as we know it today – page cache, dynamic I/O buffering, memory mapped files (mmap(2)), shared memory etc.
They all require a functioning MMU, without which the functionality would be severely restricted (but not entirely impossible).
The no-MMU version of Linux has all of those features except that memory-mapped files (mmap) are limited. These features are the same as in MMU Linux: page cache, dynamic I/O buffering, shared memory. No-MMU Linux also supports other modern memory-related features, like tmpfs, futexes. I think it even supoprts io_uring.
The original UNIX literally swapped processes, as in write all their memory to disk and read another program's state from disk to memory, it could only run as many processes as many times the swap was bigger than core, this is a wholly unacceptable design nowadays.
In an embedded scenario where the complete set of processes that are going to be running at the same time is known in advance, I would imagine that you could even just build the binaries with the correct base address in advance.
A common trick to decrease code size in RAM, is to link everything to a single program, then have the program check its argv[0] to know which program to call.
With the right filesystem (certain kinds of read-only), the code (text segment) can even be mapped directly, and no loading into RAM need occur at all.
These approaches saves memory even on regular MMU platforms.
Not having an MMU puts you more into the territory of DOS than UNIX. There is FreeDOS but I'm pretty sure it's x86-only.