Hacker News new | ask | show | jobs
by ornxka 1522 days ago
Swap has always been so slow for me that I just disable it on all of my machines. I would rather the OOM reaper just SIGKILL whatever is using all my RAM than deal with slowness (which often persists after the OOM situation is gone).
1 comments

Yes, but that's not quite how it works. As mentioned in this article, without swap, the system can live-lock before the OOM killer can take care of things. This has been my experience as well.

I had hoped that getting rid of swap would prevent thrashing, but instead the system would live-lock.

I feel like the kernel live locking in low memory environments should be treated as a bug and not something we try to solve using swap. Like...if the OOM killer can't free memory under low memory constraints, something has gone seriously wrong
I've tried twice to update Lakka (a custom Linux image that runs RetroArch and emulators) on a RPi 3B with 1 gigabyte of memory (and I think no swap). Updating Lakka from the Pi itself invokes a script which downloads a file from the network to SD card storage. This should not take up unbounded amounts of memory, but seems to trigger OOM livelock or something anyway (I can't tell what it is, since the GUI and SSH session hang, and Lakka doesn't enable Alt+SysRq or TTYs for some godforsaken reason). Enabling SSH and running `watch -n1 free -h` shows a concerningly low free memory amount, and IIRC the available memory number crept downwards before the GUI and SSH session both hung simultaneously.

Terminating the GUI and performing an offline update from one SSH session while running `watch -n1 sync` avoids the hang. I haven't tried sync while leaving the GUI running.

Will Linux OOM handling ever be fixed?

The problem is the OOM killer only runs if no pages can be freed. But, on any system, there is some amount of memory that can be swapped out: code pages can be swapped out to their executable/library files on disk. On a system only running a few processes, this is unlikely to matter.

But if you have a lot of processes, you may end up in a situation where you can free enough pages by swapping them all out to disk. Now you have no OOMKill, but the next time a process gets to run, you will stall until it is swapped in from disk. Then, the next process will be scheduled, and it too will need to be swapped in, causing another stall, and so on. The machine will probably end up executing, at best, a few instructions per millisecond for hours...

I can't for the life of me find where the article talks about live-lock and the OOM killer. I searched for "lock" and "OOM" and none of them related to that.
He doesn't use the term live-lock, but see "6" at the top of the article where he talks about "pathological behavior at near-OOM". Also "3" "Disabling swap does not prevent disk I/O from becoming a problem". Those, in my experience, have been the situations that I was in that I called live-lock.