Hacker News new | ask | show | jobs
by mnw21cam 297 days ago
Install earlyoom or one of its near-equivalents. That mostly solves the problem of it freezing up the system for long periods of time.

I haven't personally seen the OOM killer kill unproductively - usually it kills either a runaway culprit or something that will actually free up enough space to help.

For your "even for systems with half a terabyte of RAM", it is logical that the larger the system, the worse this behaviour is, because when things go sideways there is a lot more stuff to sort out and that takes longer. My work server has 1.5TB of RAM, and an OOM event before I installed earlyoom was not pretty at all.

4 comments

> For your "even for systems with half a terabyte of RAM", it is logical that the larger the system, the worse this behaviour is, because when things go sideways there is a lot more stuff to sort out and that takes longer. My work server has 1.5TB of RAM, and an OOM event before I installed earlyoom was not pretty at all.

I meant it more in the sense that it doesn't have to be more than a few hundred MB even for large RAM. It's not the size of the swap file that makes the difference, but its presence, and advice of having it be proportional to RAM are largely outdated.

> I haven't personally seen the OOM killer kill unproductively

Ah, the classical linux fan adage: "never happened to me means never happens ever to anyone".

My favourite things to see with OOM:

killing mysql on the machine which hosts only mysql and is THE production;

and the best one - killing sshd. Of course I can report on that only after seeing it on the tty0 through the BMC/IPMI console or KVM console of a VM.

nohang also has been a good one for desktops, with friendly notifications under memory stress and sane defaults.

Aside these complementary tools, the amount of systemd traps (OOM adjustment score defaults & restrictions, tmux user sessions killed by default etc etc) associated to OOM has really been taking a toll on my nerves over the years.. And kernel progress on this also has been underwhelming.

Also, why has firefox switched off automatic tab unloading when memory is low ONLY FOR LINUX? Much better ux since I turned on browser.tabs.unloadOnLowMemory ...

it's anecdata but I've had the linux OOM Killer take out OVS (Open Virtual Switch) on a kubernetes node several times.

Made me really not mind having a little swap space setup just in case.

OOMKiller, as far as I understand it, will just pick a random page, figure out who owns it, and then kill that process, repeating until enough memory is available. This will bias toward processes with larger memory allocations, but may kill any process.
> If it ever becomes necessary for the OOM Killer to kill processes, the decision of which processes to kill will be made based on something called the OOM score. Each process has an OOM score associated with it.

> Every running process in Linux has an OOM score. The operating system calculates the OOM score for a process, based on several criteria - the criteria are mainly influenced by the amount of memory the process is using. Typically, the OOM score varies between -1000 and 1000. When the OOM Killer needs to kill a process, again, due to the system running low on memory, the process with the highest OOM score will be killed first!

https://learn.redhat.com/t5/Platform-Linux/Out-of-Memory-Kil...