| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by man8alexd 138 days ago

> Thrashing is the penality for using too much swap. I was saying there is no penality for having a lot of swap available, but unused.

Unless you overprovision memory on a machine or have carefully set cgroup limits for all workloads, you are going to have a memory leak and your large unused swap is going to be used, leading to swap thrashing.

> the OMM killer laying waste to the system. Out of those two choices I prefer the system running slowly.

In a swap thrashing event, the system isn't just running slowly but totally unresponsive, with an unknown chance of recovery. The majority of people prefer OOM killer to an unresponsive system. That's why we got OOM killer in the first place.

> If load ramps up gradually you get a gradual slowdown until the working set is badly exceeded, then it falls off a cliff.

Random access latency difference between RAM and SSD is 10^3. When the active working set spills out into swap, liner increase of swap utilization leads to exponential performance degradation. Assuming random access, simple math gives that 0.1% excess causes a 2x degradation, 1% - 10x degradation, 10% - 100x degradation.

> A seminal paper on the subject: https://dl.acm.org/doi/pdf/10.1145/362342.362356

This paper discusses measuring stable working sets and says nothing about performance degradation when your working set increases.

> https://yeet.cx/r/ayNHrp5oL0.

WTF is this graph supposed to demonstrate? Some workload went from 0% to 100% of swap utilization in 30 seconds and got OOM-killed. This is not going to happen with a large swap.

> Once swapping starts latency increases gradually as more and more workers are swapped in and out while they wait for clients and the database

In practice, you never see constant or gradually increasing swap I/O in such systems. You either see zero swap I/O with occasional spikes due to incoming traffic or total I/O saturation from swap thrashing.

> Your options are get your desktop app randomly killed by the OOM killer and perhaps lose your work, or the system slows to a crawl and you take corrective action like closing the offending app.

You seem to be unaware that swap thrashing events are frequently unrecoverable, especially with a large swap. It is better to have a typical culprit like Chrome OOM-killed than to press the reset button and risk filesystem corruption.

1 comments

rstuart4133 138 days ago

> Unless you overprovision memory on a machine or have carefully set cgroup limits for all workloads, you are going to have a memory leak and your large unused swap is going to be used, leading to swap thrashing.

You seem to be very certain about that inevitable memory leak. I guess people can make their own judgements about how inevitable they are. I can't say I've seen a lot of them myself.

But the next bit is total rubbish. A memory leak does not lead to thrashing. By definition if you have a leak the memory isn't used, so it goes to swap and stays there. It doesn't thrash. What actually happens if the leak continues is swap eventually fills up, and then the OOM killer comes out to play. Fortunately it will likely kill the process that is leaking memory.

I've used this behaviour to find which process had a slow leak (it had to be running for months). This has only happened once in decades mind you - these leaks aren't that common. You allocate a lot of swap, and gradually it is filled by the process that has the leak. Because swap is so large once the process leaking memory fills it, it stands out like dogs balls because it's memory consumption is huge.

You notice all of this because, like all good sysadmins, you monitor swap usage and receive alerts when it gets beyond what is normal. But you have time - the swap is large, the system slows down during peaks but recovers when they are over. It's annoying, but not a huge issue.

> In a swap thrashing event, the system isn't just running slowly but totally unresponsive

Again, you are seem to be very certain about this. Which is odd, because I've logged into systems that were thrashing which means they didn't meet my definition of "totally unresponsive". In fact I could only log in because the OOM killer had freed some memory. The first couple of times the OOM killer took out sshd and I had to each for the reset button, but I got lucky one day and could log in. The system was so slow it was unusable for most purposes - but not for the one thing I needed, which was to find out why it had run out of memory. Maybe we have different definitions of "totally", but to me that isn't "totally". In fact if you catch it before the OOM killer fires up and kills god knows what, these "totally unresponsive systems" are salvageable without a reboot.

> This paper discusses measuring stable working sets and says nothing about performance degradation when your working set increases.

Fair enough. Neither link was good.

> You seem to be unaware that swap thrashing events are frequently unrecoverable, especially with a large swap.

Perhaps some of them are, but for me it wasn't the swapping that did the system in. It is always the OOM killer.

> It is better to have a typical culprit like Chrome OOM-killed than to press the reset button and risk filesystem corruption.

The OOM killer on the other hand leaves the system in some undefined state. Some things are dead. Maybe you got lucky and it was just Chrome that was killed, but maybe your sound, bluetooth, or DNS daemons have gone AWOL and things just behave weirdly. Despite what you say, the reset button won't corrupt modern journaled filesystems as they are pretty well debugged. But applications are a different story. If they get hit by a reset or the OOM killer while they are saving your data and aren't using sqlite as their "fopen()", they can wipe the file you are working on. You don't just lose the changes. The entire document is gone. This has happened to me.

I'd take the system taking a few minutes to respond to my request to kill a misbehaving application over the OOM killer any day.

link

man8alexd 138 days ago

> You seem to be very certain about that inevitable memory leak.

It is fashionable to disable swap nowadays because everyone has been bitten by a swap thrashing event. Read other comments.

> A memory leak does not lead to thrashing. By definition if you have a leak the memory isn't used, so it goes to swap and stays there.

You assume that leaked memory is inactive and goes to swap. This is not true. Chrome, Gnome, whatever modern Linux desktop apps leak a lot, and it stays in RSS, pushing everything else into swap.

> if the leak continues is swap eventually fills up, and then the OOM killer comes out to play

You assume that the OOM killer comes out to play in time. The larger the swap, the longer it takes for the OOM killer to trigger, if ever, because the kernel OOM-killer is unreliable, so we have a collection of other tools like earlyoom, Facebook oomd and systemd-oomd.

> I've logged into systems that were thrashing

It means that the system wasn't out of memory yet. When it is unresponsive, you won't be able to enter commands into an already open shell. See other comments here for examples.

> The OOM killer on the other hand leaves the system in some undefined state. Some things are dead. Maybe you got lucky and it was just Chrome that was killed, but maybe your sound, bluetooth, or DNS daemons have gone AWOL and things just behave weirdly.

This is not true. By default, the kernel OOM-killer selects one single largest (measured by its RSS+swap) process in the system. By default, systemd, ssh and other socket-activated systemd units are protected from OOM.

link

rstuart4133 137 days ago

> It is fashionable to disable swap nowadays because everyone has been bitten by a swap thrashing event.

If they disable swap they will get hit by the OOM killer. You seem to prefer it over slowing down. I guess that's a personal preference. However, I think it is misleading to say people are being bitten by a swap thrashing event. The "event" was them running out of RAM. Unpleasant things will happen as a consequence. Blaming thrashing or the OOM killer for the unpleasant things is misleading.

> You assume that leaked memory is inactive and goes to swap. This is not true.

At best, you can say "it's not always true". It's definitely gone to swap in every case I've come across.

> It means that the system wasn't out of memory yet.

Of course it wasn't out of memory. It had lots of swap. That's the whole point of providing that swap - so you can rescue it!

> When it is unresponsive, you won't be able to enter commands into an already open shell.

Again that's just plain wrong. I have entered commands into a system is trashing. It must work eventually if thrashing is the only thing going on, because when the system thrashes the CPU utilization doesn't go to 0. The CPU is just waiting for disk I/O after all, and disk I/O is happening at a furious pace. There's also a finite amount of pending disk I/O. Provided no new work is arriving (time for a cup of coffee?) it will get done, and the thrashing will end.

If the system does die other things have happened. Most likely the OOM killer if they follow your advice, but network timeouts killing ssh and networked shares are also a thing. If you are using Windows or MacOS, the swap file can grow to fill most of free disk space, so you end up with a double whammy.

Which brings me to another observation. In desktop OS's, the default is to provide it, and lots of it. In Windows swap will grow to 3 times RAM. This is pretty universal - even Debian will give you twice RAM for small systems. The people who decided on that design choice aren't following some folk law on they read in some internet echo chamber. They've used real data, they've observed when swapping starts being used systems do slow down giving the user some advance warning, when thrashing starts systems can recover rather than die which gives the user opportunity to save work. It is the right design tradeoff IMO.

> By default, the kernel OOM-killer selects one single largest (measured by its RSS+swap) process in the system.

Yes, it does. And if it is a single large process hogging memory you are in luck - the OOM killer will likely do the right thing. But Chrome (and now Firefox) is not a single large process. Worse if the out of memory is caused by say someone creating zillions of logins, they are so small they are the last thing the OOM killer chooses. Shells, daemons, all sorts of critical things go first. The "largest" process first is just a heuristic, one which can be and in my case has been wrong. Badly wrong.

link

man8alexd 135 days ago

> You seem to prefer it over slowing down.

An unresponsive system is not a slowdown. You keep ignoring that.

>> You assume that leaked memory is inactive and goes to swap. This is not true.

> At best, you can say "it's not always true".

You skipped my sentence that was specifying the scope when "it's not always true", and now you pretend that I'm making a categorical generalized statement. This is a silly attempt at a "strawman".

>> It means that the system wasn't out of memory yet.

> Of course it wasn't out of memory. It had lots of swap. That's the whole point of providing that swap - so you can rescue it!

Swap is not RAM. When the free RAM is below the low watermark, the kernel switches to direct reclaim and blocks tasks that require free memory pages. Blocking of tasks happens regardless of swap. If you are able to log in and fork a new process, the system is not below the low watermark.

>> When it is unresponsive, you won't be able to enter commands into an already open shell.

> Again that's just plain wrong.

You are in denial.

> Provided no new work is arriving (time for a cup of coffee?) it will get done, and the thrashing will end.

This is false. A system can stay unresponsive much longer than a cup of coffee. There is no guarantee that the thrashing will end in a reasonable time.

> even Debian will give you twice RAM for small systems.

> The people who decided on that design choice aren't following some folk law on they read in some internet echo chamber.

That 2x RAM rule is exactly that - an old folk law. You can find it in SunOS/AIX/etc manuals or Usenet FAQs from the 80s and early 90s, before Linux existed.

> They've used real data.

You're hallucinating like an LLM. No one did any research or measurements to justify that 2x rule in Linux.

link