Some programs allocate a lot of virtual memory and then don't use it.
Also, linux's forking model can result in a lot of virtual memory being allocated if a heavy-weight program tries to fork+exec a lot of smaller programs, since fork+exec it not atomic and briefly doubles the virtual memory usage of the original program.
I think there are better ways to spawn programs that don't suffer from this problem now...
If you have programs that are written to allocate virtual memory sparingly (like postgres) then that should be fine.
However, there is a second way you can be caught out: even if you disable overcommit, your program can still be OOM killed for violating cgroup limits, since cgroup limits always behave as though over-commit is enabled (ie. they allow you to allocate more than you are allowed, and then you get OOM killed when you try to use the allocated memory). This means you'd have to be really careful running eg. postgres inside a kubernetes pod.
This behaviour really sucks IMO. I would like it if you could set overcommit on a per-program basis, so that eg. postgres can say "I know what I'm doing - when I allocate virtual memory I want you to really allocate it (and tell me now if you can't...)". I think you can somewhat achieve this with memory locking, but that prevents it from being paged out at all...
The fork issue is solved by adding swap. Making sure you have plenty of swap solves these issues, and I'd like to argue that it is more reliable than using overcommit.
It is certainly one way to solve that specific issue, assuming the program was written to take advantage of it. As mentioned, there are several other reasons a program may use a lot of virtual memory though.
Consider this scenario: a process runs a fork(), and shortly after it runs an exec(). Normally, the extra fork only uses a tiny amount of extra memory, because the memory is shared between the parent and the child, until one of them writes to it (copy-on-write).
With overcommit disabled, the kernel must reserve enough space to copy the whole writable RAM of a process when it forks.
So you have a 16GB machine, and an 8.1GB process cannot spawn any other program through the usual fork + exec routine (workarounds exist, like forking before allocating lots of memory and using IPC to instruct the low-memory fork to fork again and launch, but that's way more complicated than a simple fork + exec).
So if you have a dedicated DB host and you know that your DB engine is very carefully engineered to work with disabled overcommit, you can disable it. On a general-purpose machine a disabled overcommit will waste lots of RAM that's sitting unused.
If you mean “why isn’t it disabled by default on Linux installs”: most programs don’t expect malloc(2) to ever return NULL. Those programs will just assume the return value from malloc(2) is valid memory. In the best case, they’ll immediately write to it and protection-fault. In the worst case, they’ll hold onto this NULL pointer for a while, passing it around, until eventually something else somewhere distant in the program blows up some unknown amount of time later.
Even those programs that are “malloc(2) error aware”, often do something stupid and counterproductive in response, like attempting to allocate more memory for an exception object / stack trace / error string.
Programs that do something useful in response to a NULL malloc(2) return result — useful for the stability of the system as a whole, better than what the OOM killer gets you — are rare, even on servers. Usually it’s only stateful, long-running, DBMS-like daemons that 1. bother, and 2. have the engineering effort put into them to do the right thing.
Most of these were likely managed-language programs.
Programs witten for managed language runtimes will have a language-runtime-level abort on malloc(2) fail, which usually is well written, in the sense that it will clean up language-runtime-level resources, and emit a language-runtime-level error message.
But this language-runtime-level abort usually isn’t exposed to the application in any hookable way, so from the developer’s perspective, it’s basically the same as being OOM killed. There’s no option to clean up e.g. an individual transaction’s resources in order to keep going. There’s no hooks for libraries to use to e.g. properly send close messages on sockets (if the language runtime doesn’t do that itself as part of managing socket lifetimes.) Etc.
These managed runtimes (e.g. the JVM) may expose a catchable exception for OOM errors, but these are for internal, language-runtime level OOM errors, triggered by the runtime itself under certain conditions, rather than in response to a syscall failure. When malloc(2) fails, it’s basically “too late” from these runtimes’ perspectives — they no longer have the resources required to allow the user to run any more code.
« Most of these were likely managed-language programs. »
Please don't guess. They weren't.
It is true that a program that aborts as soon as malloc returns failure isn't doing any special cleanup or attempting to keep going.
But that's not at all the same as « Those programs will just assume the return value from malloc(2) is valid memory. In the best case, they’ll immediately write to it and protection-fault. », which is what I'm informing you is too pessimistic.
I'm not guessing. I'm just answering you by ignoring/disregarding your personal experience, and instead treating you as a random variable sampling the population of people who use all possible software, and then talking about what that random variable would look like.
Why? Because we don't need anecdotes to know this particular thing — we have the data. We know what the random variable actually looks like. (How? Because people have downloaded "all of Github" or "the entire Debian package archive", and run Valgrind over it, and uploaded the resulting dataset to BigQuery!) By the Law of Large Numbers, we can actually do stats about, effectively, what "all software" looks like.
By volume, the majority of POSIX software that calls malloc(2), is incompetently-written, with no checks on the return result of malloc(2). This is an objective, verifiable fact.
By volume, the majority of POSIX software that has a call to malloc(2) that does check the return value, exists as the result of a managed-language compiler emitting a language-runtime-level check into the compiled binary, rather than as an explicit source-level check. Another objective, verifiable fact.
-----
It so happens that the software making up the "backbone" of an OS / average LAMP server is more competently-written, because it's had a lot more attention and engineering time put into it.
But the same "power law of features" from e.g. Microsoft Office applies here — there's a core set of stuff everyone uses, but every user also has some weird stuff they are in the small minority of users for. And that stuff is what breaks.
As it happens, that lesser-used stuff is also usually mission-critical to the operation of a business; otherwise people wouldn't be driven to use such not-a-lot-of-engineering-effort-put-in software in the first place. People are using this stuff "in anger", if they're using it at all.
Which means that, sadly — insofar as most developers creating business-process IPC pipelines, don't already have the hard-won experience to build in fault-tolerance for individual processes within that pipeline — we see production systems where these malloc(2) failures are Single Points of Failure for the entire system. The flakiness of these long-tail programs, drags down the reliability of most systems-as-a-whole.
Because many program "unnecessarily" allocate a lot of virtual memory and then never actually page it in, disabling overcommit will start killing processes due to lack of memory even though most of the memory isn't been actually used.
It's extremely useful to be able to map more virtual memory space than exists physical memory in your computer. This is what makes e.g. mmap'ed access to large files possible.
Also, linux's forking model can result in a lot of virtual memory being allocated if a heavy-weight program tries to fork+exec a lot of smaller programs, since fork+exec it not atomic and briefly doubles the virtual memory usage of the original program.
I think there are better ways to spawn programs that don't suffer from this problem now...
If you have programs that are written to allocate virtual memory sparingly (like postgres) then that should be fine.
However, there is a second way you can be caught out: even if you disable overcommit, your program can still be OOM killed for violating cgroup limits, since cgroup limits always behave as though over-commit is enabled (ie. they allow you to allocate more than you are allowed, and then you get OOM killed when you try to use the allocated memory). This means you'd have to be really careful running eg. postgres inside a kubernetes pod.
This behaviour really sucks IMO. I would like it if you could set overcommit on a per-program basis, so that eg. postgres can say "I know what I'm doing - when I allocate virtual memory I want you to really allocate it (and tell me now if you can't...)". I think you can somewhat achieve this with memory locking, but that prevents it from being paged out at all...