Hacker News new | ask | show | jobs
by tmyklebu 3082 days ago
> Very few people on this thread read and understood the article.

Hmm. I read the article and I think I understood it. However, in my experience, you run out of RAM if and only if your working set is too big. In my experience, all involved find it desirable to reduce the size of the working set as quickly as possible. Your experience seems to differ.

> The point isn't working with data sets larger than RAM. The point is making better use of the RAM you do have by taking pages you'll almost never touch and spilling them to disk so that there's more room in RAM for pages you will touch.

Your reasoning is too sloppy. It supports neither your blanket statements nor your pained analogy.

You appear to presuppose that:

(1) The kernel can predict which pages the user will "almost never touch."

(2) Mispredicting which pages will be "almost never touched" is of relatively low cost.

(3) Swapping pages that the user will "almost never touch" to disk frees up an appreciable amount of RAM.

(4) When pulling those pages back from disk, the work held up is, on average, less important than whatever we got to do with the RAM in the meantime.

I disagree with (1). Like I said elsewhere in the comments on this article, the kernel cannot reliably predict whether a process will "almost never touch" a given page. The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.

I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad. When lots of mispredictions happen in a tight cluster, the kernel and all running processes will be stopped when the user forcibly bounces the machine. If you let the OOM killer run instead of swapping, the kernel stays up and only a few running processes die. Having a working set whose size is larger than RAM but smaller than RAM + swap seems to be a recipe for a very long cluster of such mispredictions and a human intervention.

I am curious to hear about workloads where (3) occurs. (Non-latency-sensitive Java code that doesn't churn objects too fast? You've allocated a heap of a certain size, and the half or so that's free doesn't get disturbed too much.)

Regarding (4), even if the kernel could reliably predict cold pages, "page will almost never be touched" isn't necessarily the right criterion for swapping a page to disk. What if reading from the page will be on the critical path for something users do care about, such as logging in and killing a misbehaving process?

1 comments

> (1) The kernel can predict which pages the user will "almost never touch."

> I disagree with (1). [...] The kernel does not have sufficiently detailed knowledge of the process's purpose or access patterns.

You're in for quite a surprise, particularly on desktop. I have a number of processes with some pages swapped out, and I see no impact on interacting with the said processes. Firefox, gDesklets, a volume changer, and several instances of rxvt are among them.

> (2) Mispredicting which pages will be "almost never touched" is of relatively low cost.

> I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad.

Only in the case of repeated mispredictions, which only happens if you really have low RAM and are on a good way to invoke OOM killer anyway. With (1) being quite accurate (mainly because swapping out unused pages is not that aggressive), (2) magically becomes true as well.

> You're in for quite a surprise, particularly on desktop. I have a number of processes with some pages swapped out, and I see no impact on interacting with the said processes. Firefox, gDesklets, a volume changer, and several instances of rxvt are among them.

Is an appreciable amount of RAM freed up here? I was under the impression that Firefox churned through whatever it allocated (garbage-collected Javascript VM) and rxvt had a very small footprint, most of which is code shared among all of your rxvt instances.

>> I also disagree with (2). The consequences of getting these predictions wrong seem to be very bad. > > Only in the case of repeated mispredictions, which only happens if you really have low RAM and are on a good way to invoke OOM killer anyway.

Even if things were as rosy as you suggest, isn't that my point? Better that the OOM killer cleans something up than I bounce the machine and clean everything up. That said, the OOM killer won't necessarily run anytime soon:

I just spun up a VM with 1GB of memory and 1GB of swap. 'time ssh guest echo hello' from the host usually takes anywhere from 140ms to 1.2s. I wrote a C program that allocates a gigabyte (in two 512MB pieces) and churns it through swap by writing to random bytes. 'time ssh guest echo hello' now takes 4-8 seconds. The oom killer didn't run once in the five minutes I ran the swap-churning process. Setting /proc/sys/vm/swappiness to 0 didn't change the symptoms; 'time ssh guest echo hello' still takes 4-8 seconds. This is on Linux 4.9.65.

If I crank the number of churning threads up from 1 to 8, the one 'time ssh guest echo hello' I tried took 33 seconds. I am not patient enough to see what happens with 64 churning threads, which is entirely reasonable, but I would expect the latency involved in rescuing the machine to cause any reasonable administrator to simply bounce it.

In this workload, the kernel is consistently failing to predict which pages are unimportant; the mispredictions are expensive; the RAM saved by swapping out bash, sshd, and killall (or whatever) is negligible; and the important work of allowing remote login to diagnose and clean up the mess is held up unconscionably long to make room for what, in practical instances, is a user error.

I did a 'swapoff -a' and ran the same C program and it gets killed almost immediately.