Hacker News new | ask | show | jobs
by gkbrk 899 days ago
Swap usually makes it worse, without swap there is some chance that the Linux OOM killer does something useful and saves the system. With swap, it becomes a frozen system that never manages to kill anything due to all the swapping. You can wait 5 minutes, 10 minutes, or 15 minutes, but the system never recovers without a reboot.
2 comments

That only works if the system is accessing mostly anonymous pages. If the load on the system is accessing plenty of mmapped code/data pages, it can still trash those even if swap is disabled. I've still seen systems hanging for 10+ minutes without recovering even though swap was already disabled.

The Linux kernel OOM killer only acts if there's nothing left that can be discarded, which often happens way too late to save the system. You need a user-mode OOM-killer like earlyoom if you want to keep the system responsive.

Yes, I don't disagree that user-mode OOM-killers are helpful, or that the system can still hang without swap.

I'm just saying turning on swap, or increasing the swap capacity does not fix the problem, and it usually makes it worse.

This is why on production systems important binaries should mlock all of their code pages into ram at startup.
That would normally mean you ignore the system getting slower for a long time.
Not a long time. With swap enabled, when a process consumes too much memory your system goes from perfect performance to cursor lagging to everything is frozen and you can't even switch to a TTY within 5-10 seconds.

Without swap, the system lags for a couple seconds, OOM killer frees up memory and you're good to go again. The only slowdown is any pages that were kicked out from the file cache. But those quickly come back after the OOM killer does its thing.

What if the thing you kill is in the critical stack to saving your work? If it isn't I don't really understand why you would be swapping it in a lot

I would view the OOm solution as a compute as cattle thinb, but here we are talking about a user desktop where the user can take the best action for themselves once they realize there's a problem.

> If it isn't I don't really understand why you would be swapping it in a lot.

The user doesn't decide which processes are swapped in. If the process gets CPU time and tries to access its data, that data will get swapped in.

> We are talking about a user desktop where the user can take the best action for themselves once they realize there's a problem.

You can't do that with swap, because once you realize there's a problem, you cannot even move your cursor or run commands to take any actions.

This sounds nothing like the gradual leak problem described.. An OOM killer is great once you actually run out of resources, removing virtual resources to always run out and play roulette is using it as a fad hammer.
You have only about 10sec between the system getting slower and the system locking up completely. If you manage to hit the Magic SysRq key combination to trigger OOM manually, that can save the system, but you have to be quick.

[1] https://en.wikipedia.org/wiki/Magic_SysRq_key