Hacker News new | ask | show | jobs
by ameliaquining 17 days ago
I read him as arguing that overcommit was a mistake. Of course, he doesn't answer any of the obvious follow-up questions, such as, does fork–exec copy all the process's memory and then immediately throw it away, or what. (One could argue that fork–exec was also a mistake, but it long predates Linux, so this doesn't answer the question of how Torvalds should have designed it.)
3 comments

> does fork–exec copy all the process's memory

NT: Yes? Why not?

(note that this refers to the Windows NT kernel's operation because it had historically a POSIX emulation layer (NT Personalities), not the modern WSL which is just Linux in a Hyper-V)

because this is what causes Windows to use ~80% more memory than unixes
Well, in that case it's a good thing I guess. Windows is orders of magnitude better when it comes to memory management on the desktop compared to Linux. Like why would I even want a single process killed by OOM killer? On Windows things just work, or get slow. On Linux it works and then mayhem ensues.

Last year I was writing a reply on a forum in Firefox on Linux when the OOM killer decided to nuke Firefox. Poof gone, mid keystroke. How does anyone think that's acceptable?

This was on a stock Linux distro, nothing special.

> Windows is orders of magnitude better when it comes to memory management on the desktop compared to Linux.

The bar is pretty low, but the windows scheduler is aware what the currently focussed app is so it can prioritise not killing it.

On Linux? Not so much.

Actually, it depends on the Windows scheduler settings. On Windows Server, the default is to kill the foreground process (on the assumption that it is just a management app rather than a critical server component).
In either case, Windows tries a lot of things to avoid killing processes. Which at least in a desktop setting is an infinitely better approach than random beheadings without warning.
Processes are usually spawned with CreateProcess. There's no fork in win32.
Windows doesn't use fork/exec for process creation in any relevant way today

There are Native APIs for implementing fork (needed for the obsolete POSIX subsystem, primarily), but even on the Native API side, processes are usually spawned through NtCreateProcess or RtlCreateUserProcess, though there is a bunch of setup with regards to the Csr APIs for the Win32 CreateProcess[1]).

[1]: https://stackoverflow.com/a/69605729/2805120

> does fork–exec copy all the process's memory and then immediately throw it away, or what

No, you just account for it (commit the charge) in the bookkeeping. If a 1GB process forks, you decrement the amount of free memory by 1GB to ensure other processes don't overcommit such that you won't have 1GB of free memory if and when you actually needed to allocate that memory. If the forked process immediately exits, you just bump the free memory counter back up. This is what Solaris and Windows do.

But precise accounting of memory is difficult if you didn't design for it in the first place. For example, you have to figure in the memory needed for page structures. (Though I think Linux can do that in particular, bugs notwithstanding.) Last time I checked (5+ years ago) Linux was incapable of such precise accounting across the board, so even if you disabled overcommit the kernel could still find itself in an OOM situation when the time comes to allocate memory it already promised or perform an operation it implicitly or explicitly guaranteed it could complete.

The expectation that Linux overcommits meant many Linux kernel developers didn't design subsystems in a way that the kernel as a whole could provide reliable, guaranteed, precise memory accounting. For example, some filesystems rely on being able to use the OOM killer to free up memory needed for an operation that it can't back out of once it starts because it wasn't written in a way that it could either predetermine or bound it's memory requirements, or cleanly back out of an operation it started.

To be fair I'm not sure any of the BSDs can do it either, at least when it comes to fork and CoW. IIRC, nor can macOS, though it will dynamically add swap so you won't get an OOM kill until you run out of disk space.

Well, Windows doesn't have fork–exec so there's no problem with a 15 GB process spawning a 15 MB subprocess. Whereas doing that on Linux without overcommit requires there to be 15 GB free. vfork and posix_spawn work around this, but lots of existing code doesn't use them, vfork is notoriously hard to use correctly, and posix_spawn doesn't (and doesn't try to) cover all fork–exec use cases.
NtCreateUserProcess supports copy-on-write fork semantics without overcommit. See https://github.com/huntandhackett/process-cloning#cloning-fo... And there's a wrapper, RtlCloneUserProcess, that is called using the same traditional fork pattern.

Precise memory accounting and CoW fork aren't intrinsically antagonistic, and the general ability to clone CoW mappings or similar kernel structures is useful beyond fork, which is why NT had all the necessary facilities in the kernel (it's the userspace CRT state that can be tricky, especially in the presence of threads, which is true on Unix systems as well).

The example of forking a process with a giant VM space just to exec some other program is, IMO, a straw man. Processes with such huge RW mappings typically don't fork and exec like that. Nobody architecting an app like PostgreSQL was relying on the ability to easily fork processes for minor tasks or exec utilities from processes already forked for resource intensive tasks. And when such a thing is desirable, it's easy enough to use the alternatives, like vfork, or architect a controller for spawning subprocesses, or just use threads. Heck, fork existed long before CoW. Expectations around fork, that you can and should be able to call it without any forethought about resource management was a consequence of Linux' popularity.

Linux embraced overcommit because people wanted to run existing big iron applications like networked databases on tiny PCs with fractions of the memory those applications were written to expect to be able to use. Overcommit was a hack that let your play around with those applications without them immediately falling over, partly because back then such applications often preallocated memory for cache, etc, but would never use all of it when running in an environment like early Linux, which would never see the same high loads and utilization as big iron servers.

Linux could have pivoted in the other direction and pursued strict memory accounting with the ability to expressly overcommit in, e.g., some process subtrees or dynamically allocate swap (which in the expected scenario it normally wouldn't have to actually do). But like most userspace developers they found it easier to write kernel code when they could pretend memory was infinite, and when the system hit the wall just blow up and blame the user. That choice can be defensible for userspace, but it's simply not defensible for a kernel.

To be 100% fair, it's rare that processes are cloned on Windows, if only because it's part of the Native API that applications generally don't use directly, and CreateProcess is easier and does all the housekeeping stuff, etc, that people writing Windows applications generally come to expect (or don't even know happens)

I do think overcommit was a poor design choice, but I think it probably mostly does logically follow from the fact that fork and friends are the only ways available to create a process that's available to userspace. It's quite unfortunate though.

Part of the problem is that some applications wanted to reserve lots of address space but didn't necessarily want to touch it right away (such as when they were using it sparsely). Something that VirtualAlloc(x, MEM_RESERVE) (or mmap(..., MAP_NORESERVE)) would be suited for. But while malloc exists, mreserve doesn't in libc, and I think it was pretty uncommon to use it.

Fork should be replaced by vfork (or something better) in almost all situations.