Hacker News new | ask | show | jobs
by caf 2515 days ago
There _NEEDS_ to be a setting where program pages don't get pushed out for disk cache, peroid, unless approaching a low ram situation, but BEFORE it causes long periods of total crushing unresponsiveness.

Here's the thing: a mapped program page is just another page in the page cache. Now, you could maybe say that "any page cache page that is mapped into at least one process will be pinned", but the problem there is that means that any unprivileged process can then pin an unlimited amount of memory, which is an obvious non-starter.

A workable alternative might be to add an extended file attribute like 'privileged.pinned_mapping', which if set indicates that any pages of the file that have active shared mappings are pinned. That means the superuser can go along and mark all the normal executables in this way, and the worst-case memory consumption a user can cause is limited by the total size of all the executables marked in this way that the user has access to.

2 comments

SuSE solves this in their SuSE Linux Enterprise Server (SLES) with a new sysctl tunable, which soft-limits the size of the page cache.

https://www.suse.com/documentation/sles-for-sap-12/book_s4s/...

It is quite effective, although historically there have been issues with bugs causing server lockups in the kernel code around this tunable. It seems to be quite stable in SLES 15, however.

While the tunable is available in their regular SLES product, it is only supported in the "SLES for SAP". The two share the same kernel, that is probably why.

Theres no reason extra data cannot be added to entries in the page cache to make smarter decisions. That’s how Windows and OS X do it in their equivalent subsystems.

Nobody is suggesting these pages be pinned which is an extreme measure.

The problem I'm trying to point out here is that if the extra metadata in the page cache is entirely under user control (like for example "is mapped shared" and/or "is mapped executable") then it amounts to a user-specified QOS flag.

That might be OK on a single-user system but it doesn't fly on a multi-user one. That's why I suggested you could gate that kind of thing behind some kind of superuser control.

Why can’t a user make QoS decisions for their own pages? Root controlled pages should obviously have higher priority.

The kernel could still “fairly” evict pages across users - just letting them choose which N pages they prefer to go first.

Why can’t a user make QoS decisions for their own pages?

Because then you just get everyone asking for maximum QOS / don't-page-me-out on everything they can.

The pages in the page cache are not owned by a particular user, they're shared. If there's three users running /usr/bin/firefox, they'll all have have shared read-only executable mappings of the same page cache pages. If you do a buffered read of a file immediately after I do the same, we both get our data copied from the same page cache page. So it's not at all clear how you'd do the accounting on this to implement that user-based fairness criterion.