Hacker News new | ask | show | jobs
by tpetry 1595 days ago
Sure but setting up swap is still recommended. I guess some internal linux memory algorithms prefer to have some safety net? Setting up a 1GB zram swap was effective for me, it‘s not much wasted memory as servers have so much memory these days and because of compression it can fit more than 1GB.
1 comments

What I've seen is that under memory pressure kernel tasks trying to evict or free pages to satisfy an allocation request can race with other tasks dirtying and filling pages even faster, especially via the buffer cache. This can induce patterns of lock contention on low-level VM data structures and flushing procedures that effectively behave like a deadlock. Eventually various loop limits and lock timeouts will help unstick things, but in the worst cases the system gets caught in higher order loops and I've seen systems lock up for minutes. The systems I've seen this on never had any swap.

Some of the heuristics designed to minimize pathological contention latency seem to implicitly assume that the swapping subsystem--both in its ability to help free space, and the latency it introduces when loading and evicting pages--will help mitigate the chance tasks will get caught in a tight contention loop. IOW, the I/O latency of swap effectively induces back pressure on load, helping operations freeing pages to progress faster than operations consuming pages. (Predictably, the faster your swap, the less well this works. When people began putting swap on SSDs, heuristics had to be retuned.)

Arguably the root of the problem is the legacy of overcommit. Even though it can be nominally disabled, many aspects of the kernel were designed with the notion that the only direction to move under memory pressure is forward, relying on the promise of the OOM killer eventually freeing up enough memory to maintain forward progress on the current operation, rather than unwinding and returning a failure condition. The dynamic seems similar to buffer bloat.

Overcommit on POSIX systems is a consequence of fork/exec architecture of starting new processes. Fork must double the memory accounting if one disables overcommit and for a memory hog that starts a helper process that may lead to OOM when when in fact there are enough memory.

Windows, that does not have fork, does not suffer from this.

POSIX is not restricted to a fork/exec model, there's also posix_spawn(). Additionally it may be possible to somehow rely on COW to avoid double accounting after fork, provided that the new process execs "soon after". This would cover cases where posix_spawn() cannot be directly used, because some fixup needs to occur before the new process exec's.
Without double-accounting after the fork it is in general impossible to guarantee that no running program needs to be killed on OOM as the kernel does not know what the process is going to do after the fork. And fork still has legitimate users that does not involve exec. With multi-process architecture that many programs like browsers use for security it is a common optimization to have a seed process with sandbox initialized that is forked as necessary into a worker process.