Hacker News new | ask | show | jobs
by o11c 332 days ago
People keep saying this, yet infinite real-world experience shows that systems perform far better if the OOM Killer actually gets to kill something, which is only possible with swap disabled. In my experience, the OOM killer picks the right target first maybe 70% of the time, and the rest of the time it kills some other large process and allows enough progress for the blameworthy process to either complete or get OOM'ed in turn. In either case, all is good - whoever is responsible for monitoring the process notices its death and is able to restart it (automatically or manually - the usual culprits are: children of a too-parallel `make`, web browsers, children of systemd, or parts of the windowing environment [the WM and Graphical Shell can easily be restarted under X11 without affecting other processes; Wayland may behave badly here]). If you are launching processes without resilient management (this includes "bubble the failure up unto my nth-grandparent handles it") you need to fix that before anything else.

With swap enabled, it is very, very, VERY common for the system to become completely unresponsive - no magic-sysrq, no ctrl-alt-f2 to login as root, no ssh'ing in ...

You also have some misunderstandings a bout overcommit. If you aren't checking `malloc` failure you have UB, but hopefully you will just crash (killing processes is a good thing when the system fundamentally can't fulfill everything you're asking of it!), and there's a pretty good chance the process that gets killed is blameworthy. The real problems are large processes that call `fork` instead of `vfork` (which is admittedly hard to use) or `posix_spawn` (which is admittedly limited and full of bugs), and processes that try to be "clever" and cache things in RAM (for which there's admittedly no good kernel interface).

===

"Swap isn't even that slow for SSDs" is part of the problem. All developers should be required to use an HDD with full-disk encryption, so that they stop papering over their performance bugs.

2 comments

+1 from me for

> With swap enabled, it is very, very, VERY common for the system to become completely unresponsive - no magic-sysrq, no ctrl-alt-f2 to login as root, no ssh'ing in ...

It's usually enough to have couple of times when you need to get into distant DC / wait for some IPMI connected for couple of hours, to learn "let it fail fast and gimme ssh back" on practice vs theory on "you should have swap on"

Conversely, having critical processes get OOMKilled in critical sections can teach you the lesson that it's virtually impossible to write robust software with the assumption that any process can die at any instruction because the kernel thought it's not that important. OOM errors can be handled; SIGKILL can't.
My only point is that you should have at least some few gig of swap space to smooth out temporary memory spikes, possibly avoiding random processes getting killed at random times, and making it very unlikely that the system will evict your code pages when it's running close to, but below the memory limit. The OOMKiller won't kick in if you're below the limit, but your system will freeze completely - virtually every time the scheduler runs, one core will stall on a disk read.

Conversely, with a few GB of old data paged out to disk, even to a slow HDD, there is going to be much, much less thrashing going on. Chances are, the system will work pretty normally, since it's most likely that memory that isn't being used at all is what will get swapped out, so it's unlikely to need to be swapped in any time soon. The spike that caused you to go over your normal memory usage will die down, memory will get freed naturally, and worse you'll see is that some process will have a temporary spike in latency soem time later when it actually needs those swapped out pages.

Now, if the spike is too large to fit even in RAM + swap, the OOMKiller will still run and the system will recover that way.

The only situation where you'll get in the state you are describing is if veitually all of your memory pages are constantly getting read and written to, so that the VMM can't evict any "stale" pages to swap. This should be a relatively rare occurrence, but I'm sure there are workloads where this happens, and I agree that in those cases, disabling swap is a good idea.

> If you aren't checking `malloc` failure you have UB, but hopefully you will just crash (killing processes is a good thing when the system fundamentally can't fulfill everything you're asking of it!), and there's a pretty good chance the process that gets killed is blameworthy.

This is a very optimistic assumption. Crashing is about as likely as some kind of data corruption for these cases. Not to mention, crashing (or getting OOMKilled, for that matter) are very likely to cause data loss - a potentially huge issue. If you can avoid the situation altogether, that's much better. Which means overprovisioning and enabling some amount of swap if your workload is of a nature that doesn't constantly churn the entire working memory.

> "Swap isn't even that slow for SSDs" is part of the problem. All developers should be required to use an HDD with full-disk encryption, so that they stop papering over their performance bugs.

You're supposed to design software for the systems you actually yarget, not some lowest common denominator. If you're targeting use cases where the software will be deployed on 5400 RPM HDDs with full disk encryption at rest running on an Intel Celeron CPU with 512 MB of RAM, then yes, design your system for those constraints. Disable swap, overcommit too, probably avoid any kind of VM technology, etc.

But don't go telling people who are designing for servers running on SSDs to disable swap because it'll make the system unusably slow - it just won't.