Hacker News new | ask | show | jobs
by quotemstr 3078 days ago
Very few people on this thread read and understood the article. The point isn't working with data sets larger than RAM. The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch.

Banning swap is like making self-storage companies illegal and forcing everyone to hold all possessions in their homes. Sure, you'd be able to get to grandma's half broken kitschy dog coaster that you can't bring yourself to throw away, but you'd also be harder to harder to fit and find your own stuff, the stuff you need all the time.

If you find yourself driving to and from the self storage place every day, you probably need a bigger home. But self storage is plenty useful even if you almost never visit it.

5 comments

The issue is that the current OOM killer doesn't support this usage at all.

To extend the analogy: what do you do if grandma comes and fills your house with stuff? You need space to work, so you go and drop it off at the self storage place, but what if she just keeps filling your house up?

The OOM killer will do absolutely nothing until both your house and the whole self storage place are totally full. By that point, you've spent a huge amount of time just driving to and from self storage, so you haven't had time to do any actual work; it would probably have been better to tell grandma that you don't want any more stuff once she filled up your house for the first time.

Well, it doesn't help that when grandma calls and asks whether you have room for more stuff, the Linux kernel responds on your behalf, "Yes, of course I have room. I live in a TARDIS." And you then do all driving to the self-storage facility to maintain the illusion as long as you can. I really don't like overcommit.

Anyway, I agree with you that this behavior is annoying, but I think it ought to be possible to fix it (e.g., with memory cgroups or something like Android's lmkd) without giving up on the idea of spilling infrequently-accessed private dirty pages to disk.

The analogy is now getting in the way, rather than helping to clarify.
One problem with relying on the OOM killer in general is that the OOM killer is only invoked in moments of extreme starvation of memory. We really have no ability currently in Linux (or basically any operating system using overcommit) to determine when we're truly "out of memory", so the main metric used is our success or failure to reclaim enough pages to meet new demands.

As for the analogy -- there are metrics you can use today to bat away grandma before she starts hoarding too much. We have metrics for how much grandma is putting in the house (memory.stat), at what rate we kick our own stuff out of the house just to appease grandma, but then we realise we removed stuff we actually need (memory.stat -> workingset_refault), and similar. Using this and Johannes' recent work on memdelay (see https://patchwork.kernel.org/patch/10027103/ for some recent discussion), it's possible to see memory pressure before it actually impacts the system and drives things into swap.

> One problem with relying on the OOM killer in general is that the OOM killer is only invoked in moments of extreme starvation of memory. We really have no ability currently in Linux (or basically any operating system using overcommit) to determine when we're truly "out of memory", so the main metric used is our success or failure to reclaim enough pages to meet new demands.

The problem with relying on swap instead of the OOM killer is that, instead of the OOM killer, the user gets invoked in moments of extreme starvation of memory and the whole machine gets rebooted. The OOM killer is far gentler; it only kills processes until the extreme starvation is resolved.

Well, just don't allow overcommit, problem solved.
Disallowing overcommit still doesn't solve the whole problem: you can just burn all of RAM and all of swap in commit charge, then swap. Another failure mode is excessive paging IO causing the kernel to spill private memory to the swap file prematurely, preferring instead to fill RAM with dirty disk-backed pages that only later get written out to disk. When your system is in this state, accessing even activity used pages (say, your window manager's heap) might incur be a slow hard fault.

(OSes have gotten a little more resilient against this scenario over the years, but it illustrates the issue.)

Memory pool balancing is a really hard control theory problem! I don't blame some people taking the RAM size efficiency hit and just turning off swap entirely. I just think it's a shame to have to resort to that extreme.

I wonder if ARC can be used for replacement policy:

https://en.wikipedia.org/wiki/Adaptive_replacement_cache

Although I guess it's patent encumbered...

> Very few people on this thread read and understood the article.

Hmm. I read the article and I think I understood it. However, in my experience, you run out of RAM if and only if your working set is too big. In my experience, all involved find it desirable to reduce the size of the working set as quickly as possible. Your experience seems to differ.

> The point isn't working with data sets larger than RAM. The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch.

Your reasoning is too sloppy. It supports neither your blanket statements nor your pained analogy.

You appear to presuppose that:

(1) The kernel can predict which pages the user will "almost never touch."

(2) Mispredicting which pages will be "almost never touched" is of relatively low cost.

(3) Swapping pages that the user will "almost never touch" to disk frees up an appreciable amount of RAM.

(4) When pulling those pages back from disk, the work held up is, on average, less important than whatever we got to do with the RAM in the meantime.

I disagree with (1). Like I said elsewhere in the comments on this article, the kernel cannot reliably predict whether a process will "almost never touch" a given page. The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.

I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad. When lots of mispredictions happen in a tight cluster, the kernel and all running processes will be stopped when the user forcibly bounces the machine. If you let the OOM killer run instead of swapping, the kernel stays up and only a few running processes die. Having a working set whose size is larger than RAM but smaller than RAM + swap seems to be a recipe for a very long cluster of such mispredictions and a human intervention.

I am curious to hear about workloads where (3) occurs. (Non-latency-sensitive Java code that doesn't churn objects too fast? You've allocated a heap of a certain size, and the half or so that's free doesn't get disturbed too much.)

Regarding (4), even if the kernel could reliably predict cold pages, "page will almost never be touched" isn't necessarily the right criterion for swapping a page to disk. What if reading from the page will be on the critical path for something users do care about, such as logging in and killing a misbehaving process?

> (1) The kernel can predict which pages the user will "almost never touch."

> I disagree with (1). [...] The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.

You're in for quite a surprise, particularly on desktop. I have a number of processes with some pages swapped out, and I see no impact on interacting with the said processes. Firefox, gDesklets, a volume changer, and several instances of rxvt are among them.

> (2) Mispredicting which pages will be "almost never touched" is of relatively low cost.

> I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad.

Only in the case of repeated mispredictions, which only happens if you really have low RAM and are on a good way to invoke OOM killer anyway. With (1) being quite accurate (mainly because swapping out unused pages is not that aggressive), (2) magically becomes true as well.

> You're in for quite a surprise, particularly on desktop. I have a number of processes with some pages swapped out, and I see no impact on interacting with the said processes. Firefox, gDesklets, a volume changer, and several instances of rxvt are among them.

Is an appreciable amount of RAM freed up here? I was under the impression that Firefox churned through whatever it allocated (garbage-collected Javascript VM) and rxvt had a very small footprint, most of which is code shared among all of your rxvt instances.

>> I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad. > > Only in the case of repeated mispredictions, which only happens if you really have low RAM and are on a good way to invoke OOM killer anyway.

Even if things were as rosy as you suggest, isn't that my point? Better that the OOM killer cleans something up than I bounce the machine and clean everything up. That said, the OOM killer won't necessarily run anytime soon:

I just spun up a VM with 1GB of memory and 1GB of swap. 'time ssh guest echo hello' from the host usually takes anywhere from 140ms to 1.2s. I wrote a C program that allocates a gigabyte (in two 512MB pieces) and churns it through swap by writing to random bytes. 'time ssh guest echo hello' now takes 4-8 seconds. The oom killer didn't run once in the five minutes I ran the swap-churning process. Setting /proc/sys/vm/swappiness to 0 didn't change the symptoms; 'time ssh guest echo hello' still takes 4-8 seconds. This is on Linux 4.9.65.

If I crank the number of churning threads up from 1 to 8, the one 'time ssh guest echo hello' I tried took 33 seconds. I am not patient enough to see what happens with 64 churning threads, which is entirely reasonable, but I would expect the latency involved in rescuing the machine to cause any reasonable administrator to simply bounce it.

In this workload, the kernel is consistently failing to predict which pages are unimportant; the mispredictions are expensive; the RAM saved by swapping out bash, sshd, and killall (or whatever) is negligible; and the important work of allowing remote login to diagnose and clean up the mess is held up unconscionably long to make room for what, in practical instances, is a user error.

I did a 'swapoff -a' and ran the same C program and it gets killed almost immediately.

> self storage is plenty useful

With self-storage rising to over $300 per month, it's more cost effective to take the stuff to the dump and buy it again if it is ever needed.

Well, it depends where you live. In Buffalo, you can get self-storage for around $25/month, if internets are to be believed.
"Very few people on this thread read and understood the article."

I started to read the article, and then thought, "I know this, who doesn't know this?" and stopped.

"The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch."

Exactly. Who with any technical experience in this day and age doesn't understand that. Are there really people trying to argue against swap?

> Exactly. Who with any technical experience in this day and age doesn't understand that

You're on a site infamous for the comment "I switch to Node when I want to be close to the metal".

Is this a joke or was there actually such a comment? If so, can you link it?

To someone like me, who usually lives somewhere between C++ and shader code, it sounds a bit too strange to be true.

In theory swap is useful, in practice it can be less so https://news.ycombinator.com/item?id=16147634
If I hit my thumb with a hammer, that doesn't mean the hammer isn't useful. The edge cases with swap are also entirely useless arguments against swap.
Feel free to explain it to me.

" Under no/low memory contention

[...]

Without swap: We cannot swap out rarely-used anonymous memory, as it’s locked in memory. While this may not immediately present as a problem, on some workloads this may represent a non-trivial drop in performance due to stale, anonymous pages taking space away from more important use."

Now imagine that I have no memory contention. In other words I've got 8 Gigs of memory and I have never run out of memory. The OOM killer has never run. I've never even come close. How exactly is this representing a non-trivial drop in performance?

To be fair, if I put some of my long running processes into swap, I could cache more files, but I really don't see how this represents a statistically significant improvement. I honestly can't think of anything else.

If you sometimes run out of memory (or even get close), then you should have some swap. This seems fairly obvious to me. Relying on the OOM killer to "clean things up" is pretty dubious. But was there every any serious argument to do this? I've literally never heard of that before.

I'd be very happy to hear something enlightening about this, but I didn't see anything in the article (perhaps I missed it).

> If you sometimes run out of memory (or even get close), then you should have some swap. This seems fairly obvious to me. Relying on the OOM killer to "clean things up" is pretty dubious. But was there every any serious argument to do this? I've literally never heard of that before.

Why does that seem obvious to you? With swap, running low on memory is game over. Without swap, the OOM killer runs. You can call the OOM killer dubious, graceless, or any number of other things, but it gets the system responsive again without doing as much damage as the human intervention that's otherwise required.

I mean, it really depends on your application how non-trivial the performance improvement will be, but this statement isn't theoretical -- memory bound systems are a major case where being able to transfer out cold pages to swap can be a big win. In such systems, having optimal efficiency is all about having this balancing act between overall memory use without causing excessive memory pressure -- swap can not only reduce pressure, but often is able to allow reclaiming enough pages that we can increase application performance when memory is the constraining factor.
The real question is why those pages are being held in RAM. If they're needed, swapping them out will induce latency. I'd they're a leak or not needed, the application should be fixed to not allocate swathes of RAM it does not use.
There are some systems which are memory-bound by nature, not as a consequence of poor optimisation, so it's not really as simple as "needed" or "not needed". As a basic example, in compression, more memory available means that we can use a larger window size, and therefore have the opportunity to achieve higher compression ratios. There are plenty of more complex examples -- a lot of mapreduce work can be made more efficient with more memory available, for example.
Indeed. None of the above are typically used (as in most of the time) on desktop systems where swap is the most problematic. As for compession, the only engine I know of that wants more than 128 MB of RAM is lrzip and other rzip derivatives.

Common offenders that bog down the system in swap for me as a developer are the web browser, JVM (Android) and electron based apps (messengers, two).

I would also like a source that substantiate the claim that using swap in map-reduce workloads actually helps. Or perhaps in database workloads. Or on any machine with relatively fixed workload.