Hacker News new | ask | show | jobs
by syoc 825 days ago
I strongly dislike swap on servers. I can understand some use cases laptops and one-off situations.

I would much rather have an application get killed by OOM killer than swapping. Swapping absolutely kills performance. Not having enough RAM is a faulty state and swapping hides that from admins resulting in hard to debug issues. The OOM killer leaves handy logs, swapping just degrades your service and needs to be correlated with RAM usage metrics.

My experience is also that swap will be used no matter how low (or was it high?) you set the swappiness number if the memory throughput is high enough, even if there is enough RAM available.

6 comments

We live in a world where you are charged per megabyte of RAM you allocate to your VM's. Sure, they have the occasional peak that lasts for a few seconds, but if you provision RAM for that peak it's costing you money. The cheap way out is to give it swap.

My rule of thumb is on an average load there should be no swapping, meaning that vmstat or whatever should be showing mostly 0's in the swap column. That doesn't mean it has 0 bytes in swap, in fact it probably is using some swap. It means nothing in swap is in the working set. For example, the server I'm looking at now is showing 0 swap activity, has 2GB of RAM and is using 1.3GB of swap. When a peak hits you will get some small delays of course, but it's likely no one will notice.

Doing anything else leaves money on the table.

Swap (paging) can also help performance. It exists for a reason. Having metrics on paging helps you tune your application, so it is also good for observability. It a feature that can be misused, but it is not a good recommendation to turn it off without knowledge of the specific situation.
none of my laptops/desktop/gaming rigs or servers feature swap. Swapping along with a managed language and garbage collection is next to indistinguishable from a application crash.
Is it? What about the page cache impacted by those applications’ file I/O?
it is, trying running stop-the-world type of a garbage collection along with page-in-out, etc.

>cache impacted by those applications’ file I/O

Which cache? The disk one, that depends on the available memory, with pretty much all free memory being a disk cache. In the cases of swapping, there is no disk cache left, effectively.

That is not how paging works. The swap area is also a cache in every sense of the word. And the kernel will swap out pages that clearly are accessed less frequently than another page, even if that page is the buffer cache. When used like that, swapping is a way of getting more disk cache.

Generally speaking, systems do not swap out pages only under memory pressure. That design would be ineffective. When memory pressure is high enough, you've already lost.

I can see the benefit to not having swap in a server scenario, but to offer a counterpoint- it seems like IT likes to under allocate servers by something like 25%, so if you have a server with 256GB of RAM, by design it should never use more than 192GB. That’s a lot of RAM going to waste for the off-chance usage jumps above 75-80%.

I think I would rather have the server’s SSD be an Optane drive (or some other high-endurance flash memory) with a swap partition, and use some other means of monitoring and being alerted to high memory usage.

Except that there's nothing you can do if it starts swapping suddenly and you basically lose a server once it atarts swapping due to how slow it is
I’m not sure what this scenario is but it sounds like you’d be F’d with or without swap
A lot of swap on servers is bad. A little bit is fine though and can actually be helpful in certain scenarios. I've seen servers with 256GB of RAM configured to have <1GB of swap, so in case of a runaway process the swap fills up very quickly and doesn't delay the OOM killer, but still it helps the Linux memory manager to run more optimally.
No, swapin kills performance. Swapout is always efficient and always just makes your RAM more efficient.

Any normal workload has hot and cold pages.

A process randomly dying is a faulty state. Making malloc fail and having programs respond appropriately is far preferable to just randomly trashing processes.