|
|
|
|
|
by o11c
332 days ago
|
|
People keep saying this, yet infinite real-world experience shows that systems perform far better if the OOM Killer actually gets to kill something, which is only possible with swap disabled. In my experience, the OOM killer picks the right target first maybe 70% of the time, and the rest of the time it kills some other large process and allows enough progress for the blameworthy process to either complete or get OOM'ed in turn. In either case, all is good - whoever is responsible for monitoring the process notices its death and is able to restart it (automatically or manually - the usual culprits are: children of a too-parallel `make`, web browsers, children of systemd, or parts of the windowing environment [the WM and Graphical Shell can easily be restarted under X11 without affecting other processes; Wayland may behave badly here]). If you are launching processes without resilient management (this includes "bubble the failure up unto my nth-grandparent handles it") you need to fix that before anything else. With swap enabled, it is very, very, VERY common for the system to become completely unresponsive - no magic-sysrq, no ctrl-alt-f2 to login as root, no ssh'ing in ... You also have some misunderstandings a bout overcommit. If you aren't checking `malloc` failure you have UB, but hopefully you will just crash (killing processes is a good thing when the system fundamentally can't fulfill everything you're asking of it!), and there's a pretty good chance the process that gets killed is blameworthy. The real problems are large processes that call `fork` instead of `vfork` (which is admittedly hard to use) or `posix_spawn` (which is admittedly limited and full of bugs), and processes that try to be "clever" and cache things in RAM (for which there's admittedly no good kernel interface). === "Swap isn't even that slow for SSDs" is part of the problem. All developers should be required to use an HDD with full-disk encryption, so that they stop papering over their performance bugs. |
|
> With swap enabled, it is very, very, VERY common for the system to become completely unresponsive - no magic-sysrq, no ctrl-alt-f2 to login as root, no ssh'ing in ...
It's usually enough to have couple of times when you need to get into distant DC / wait for some IPMI connected for couple of hours, to learn "let it fail fast and gimme ssh back" on practice vs theory on "you should have swap on"