| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by api 329 days ago
	Wait... that means a misbehaving program can cause out of memory errors easily by filling up /tmp? That's a very bad default.

2 comments

CamouflagedKiwi 329 days ago

A misbehaving program can cause out of memory errors already by filling up memory. It wouldn't persist past that program's death but the effect is pretty catastrophic on other programs regardless.

link

o11c 329 days ago

That actually is a pretty big difference.

Assuming you're sane and have swap disabled (since there is no way to have a stable system with swap enabled), a program that tries to allocate all memory will quickly get OOM killed and the system will recover quickly.

If /tmp/ fills up your RAM, the system will not recover automatically, and might not even be recoverable by hand without rebooting. That said, systemd-managed daemons using a private /tmp/ in RAM will correctly clear it when killed.

link

tsimionescu 329 days ago

The sane thing is to have swap enabled. Having swap "disabled" forces your system to swap out executables to disk, since these are likely the only memory-mapped files you have. So, if your memory fills up, you get catastrophic thrashing of the instruction cache. If you're lucky, you really go over available memory, and the OOMKiller kills some random process. But if you're not, your system will keep chugging along at a snail's pace.

Perhaps disabling overcommit as well as swap could be safer from this point of view. Unfortunately, you get other problems if you do so - as very little Linux software handles errors returned by malloc, since it's so uncommon to not have overcommit on a Linux system.

I'd also note that swap isn't even that slow for SSDs, as long as you don't use it for code.

link

tremon 328 days ago

Having swap "disabled" forces your system to swap out executables to disk

Read-only pages are never written to swap, because they can be retrieved as-is from the filesystem already. Binaries and libraries are accounted as buffer cache, not used memory, and under memory pressure those pages are simply dropped, not swapped out. Whether you have swap enabled or disabled doesn't change that.

Still, I hope that Debian does the sane thing and sets proper size limits. I recall having to troubleshoot memory issues on a system (Ubuntu IIRC) a decade ago where they also extensively used tmpfs: /dev, /dev/shm, /run, /tmp, /var/lock -- except that all those were mounted with the default size, which is 50% of total RAM. And the size limit is per mountpoint...

link

tsimionescu 328 days ago

> under memory pressure those pages are simply dropped, not swapped out

This is just semantics. The pages are evicted from memory, knowing that they are backed by the disk, and can be swapped back in from disk when needed - behavior that I called "swapping out" since it's pretty similar to what happens with other memory pages in the presence of swap.

Regardless of the naming, the important part is what happens when the page is needed again. If your code page was evicted, when your thread gets scheduled again, it will ask for the page to be read back into memory, requiring a disk read; this will cause some other code page to be evicted; then a new thread will be scheduled - worse case, one that uses the exact code page that just got evicted, repeating the process. And since the scheduler will generally try to execute the thread that has been waiting the most, while the VMM will prefer to evict the oldest read pages, there is actually a decent chance that this exact worse case will happen a lot. This whole thing will completely freeze the system to a degree that is extremely unlikely for a system with a decent amount of swap space.

link

pmontra 329 days ago

I've been running with swap off since my first SSD in 2015 or 2016. 16 GB RAM, then 32. No problems at all.

If I see RAM close to 30 GB I restart my browser and go back to 20 GB or less. Not every month.

link

progmetaldev 329 days ago

Are you running your own system for personal use, a service available to the public, or both? Do you normally see your system used consistently, or does it get used differently (and in random ways)?

Since you state you're running a browser, I assume you mean for personal use. Unfortunately, when you run a service open to the public, you can find all kinds of odd traffic even for normal low-memory services. Sometimes you'll get hit with an aggressive bot looking for an exploit, and a lot of those bots don't care if they get blocked, because they are built to absolutely crush a system with exploits or login attempts where they are only blocked by the system crashing.

I'd say that most bots are this aggressive, because the old school "script kiddies", or now it's just AI-enabled aggressors, just run code without understanding things. It's easier than ever to run an attack against a range of IP addresses looking for vulnerabilities, that can be chained into a LLM to generate code that can be run easily.

link

pmontra 328 days ago

That's my laptop. I'll check what my customers do on their servers, but all of them have a login screen on the home page of their services. Only one of them have a registration screen. Those are the servers I have access to. Their corporate sites run on WordPress and I don't know how those servers are configured.

Anyway, I'd also enable swap on public facing servers.

link

tsimionescu 328 days ago

Sure, if your working set always fits in RAM, you won't have problems. You wouldn't have problems with swap enabled, either.

It's only when you're consistently at the limit of how much RAM you have available that the differences start to matter. If you want to run a ~30GB +- 10% workload on a system with 32GB of RAM, then you'll get to find out how stable it is with VS without swap.

link

o11c 329 days ago

People keep saying this, yet infinite real-world experience shows that systems perform far better if the OOM Killer actually gets to kill something, which is only possible with swap disabled. In my experience, the OOM killer picks the right target first maybe 70% of the time, and the rest of the time it kills some other large process and allows enough progress for the blameworthy process to either complete or get OOM'ed in turn. In either case, all is good - whoever is responsible for monitoring the process notices its death and is able to restart it (automatically or manually - the usual culprits are: children of a too-parallel `make`, web browsers, children of systemd, or parts of the windowing environment [the WM and Graphical Shell can easily be restarted under X11 without affecting other processes; Wayland may behave badly here]). If you are launching processes without resilient management (this includes "bubble the failure up unto my nth-grandparent handles it") you need to fix that before anything else.

With swap enabled, it is very, very, VERY common for the system to become completely unresponsive - no magic-sysrq, no ctrl-alt-f2 to login as root, no ssh'ing in ...

You also have some misunderstandings a bout overcommit. If you aren't checking `malloc` failure you have UB, but hopefully you will just crash (killing processes is a good thing when the system fundamentally can't fulfill everything you're asking of it!), and there's a pretty good chance the process that gets killed is blameworthy. The real problems are large processes that call `fork` instead of `vfork` (which is admittedly hard to use) or `posix_spawn` (which is admittedly limited and full of bugs), and processes that try to be "clever" and cache things in RAM (for which there's admittedly no good kernel interface).

===

"Swap isn't even that slow for SSDs" is part of the problem. All developers should be required to use an HDD with full-disk encryption, so that they stop papering over their performance bugs.

link

CoolCold 329 days ago

+1 from me for

> With swap enabled, it is very, very, VERY common for the system to become completely unresponsive - no magic-sysrq, no ctrl-alt-f2 to login as root, no ssh'ing in ...

It's usually enough to have couple of times when you need to get into distant DC / wait for some IPMI connected for couple of hours, to learn "let it fail fast and gimme ssh back" on practice vs theory on "you should have swap on"

link

tsimionescu 328 days ago

Conversely, having critical processes get OOMKilled in critical sections can teach you the lesson that it's virtually impossible to write robust software with the assumption that any process can die at any instruction because the kernel thought it's not that important. OOM errors can be handled; SIGKILL can't.

link

tsimionescu 328 days ago

My only point is that you should have at least some few gig of swap space to smooth out temporary memory spikes, possibly avoiding random processes getting killed at random times, and making it very unlikely that the system will evict your code pages when it's running close to, but below the memory limit. The OOMKiller won't kick in if you're below the limit, but your system will freeze completely - virtually every time the scheduler runs, one core will stall on a disk read.

Conversely, with a few GB of old data paged out to disk, even to a slow HDD, there is going to be much, much less thrashing going on. Chances are, the system will work pretty normally, since it's most likely that memory that isn't being used at all is what will get swapped out, so it's unlikely to need to be swapped in any time soon. The spike that caused you to go over your normal memory usage will die down, memory will get freed naturally, and worse you'll see is that some process will have a temporary spike in latency soem time later when it actually needs those swapped out pages.

Now, if the spike is too large to fit even in RAM + swap, the OOMKiller will still run and the system will recover that way.

The only situation where you'll get in the state you are describing is if veitually all of your memory pages are constantly getting read and written to, so that the VMM can't evict any "stale" pages to swap. This should be a relatively rare occurrence, but I'm sure there are workloads where this happens, and I agree that in those cases, disabling swap is a good idea.

> If you aren't checking `malloc` failure you have UB, but hopefully you will just crash (killing processes is a good thing when the system fundamentally can't fulfill everything you're asking of it!), and there's a pretty good chance the process that gets killed is blameworthy.

This is a very optimistic assumption. Crashing is about as likely as some kind of data corruption for these cases. Not to mention, crashing (or getting OOMKilled, for that matter) are very likely to cause data loss - a potentially huge issue. If you can avoid the situation altogether, that's much better. Which means overprovisioning and enabling some amount of swap if your workload is of a nature that doesn't constantly churn the entire working memory.

> "Swap isn't even that slow for SSDs" is part of the problem. All developers should be required to use an HDD with full-disk encryption, so that they stop papering over their performance bugs.

You're supposed to design software for the systems you actually yarget, not some lowest common denominator. If you're targeting use cases where the software will be deployed on 5400 RPM HDDs with full disk encryption at rest running on an Intel Celeron CPU with 512 MB of RAM, then yes, design your system for those constraints. Disable swap, overcommit too, probably avoid any kind of VM technology, etc.

But don't go telling people who are designing for servers running on SSDs to disable swap because it'll make the system unusably slow - it just won't.

link

NekkoDroid 329 days ago

> If /tmp/ fills up your RAM

tmpfs by default only uses up to half your available RAM unless specified otherwise. So this isn't really a consideration unless you configure it to be a consideration you need to take into account.

(Systemd also really recently (v258) added quotas to tmpfs and IIRC its set by default to 80% of the tmpfs, so it is even less of a problem)

link

tremon 328 days ago

  $ grep tmpfs /proc/mounts
  udev /dev devtmpfs [..]
  tmpfs /run tmpfs [..]
  tmpfs /run/lock tmpfs [..]
  tmpfs /run/shm tmpfs [..]
  tmpfs /tmp tmpfs [..]
  cgroup_root /sys/fs/cgroup tmpfs  [..]

If each of those can take up 50% of ram, this is still a big problem. I don't know what defaults Debian uses nowadays, because I have TMPFS_SIZE=1% in /etc/default/tmpfs so my system is explicitly non-default.

link

NekkoDroid 328 days ago

Sure, but counterpoint: if a process is already writing that much in multiple of those directories, who knows what its writing in other directories that aren't backed by RAM.

link

mxey 329 days ago

https://chrisdown.name/2018/01/02/in-defence-of-swap.html

link

o11c 329 days ago

An ivory tower answer.

All those arguments would be useful if we somehow could avoid the fact that the system will use it as "emergency memory" and become unresponsive. The kernel's OOM killer is broken for this, and userland OOM daemons are unreliable. `vm.swappiness` is completely useless in the worst case, which is the only case that matters.

With swap off, all the kernel needs to do is reserve a certain threshold for disk cache to avoid the thrashing problem. I don't know what the kernel actually does here (or what its tunables are), because systems with swap off have never caused problems for me the way systems with swap on inevitably do. The OOM killer works fine with swap off, because a system must always be resilient to unexpected process failure.

And worst of all - the kernel requires swap (and its bugs) to be enabled for hibernation to work.

It really wouldn't be hard to design a working swap system (just calculate how much to keep of different purposes of swap, and launch the OOM killer earlier), but apparently nobody in kernel-land understands the real-world problems enough to bother.

link

em-bee 329 days ago

the kernel requires swap (and its bugs) to be enabled for hibernation to work

this one gets me irritated every time i think about it. i don't want to use swap, but i do want hibernation. why is there no way to disable swap without that?

hmm, i suppose one could write a script that enables an inactive swap partition just before shutdown, and disables it again after boot.

link

sigio 329 days ago

I never want to use hibernation, since then I have to re-enter my disk encryption passphrase at resume time, have to wait longer for both suspend and resume because it needs to sync upto 48GB to/from disk (and I don't want to waste 48GB of diskspace for swapspace/hibernation). Suspend to ram is fine, I can keep the system suspended for a couple of days without issues, but it only needs to survive a long weekend at most. Resume from RAM is about instant, and then just needs a screensaver unlock to get back to work.

link

nonameiguess 329 days ago

User paulv already posted this 3 hours ago in a comment currently lower than this one, but tmpfs by default can't use all of your RAM. /tmp can get filled up and be unavailable for anything else to write to, but you'll still have memory. It won't crash the entire system.

link

bigstrat2003 329 days ago

> Assuming you're sane and have swap disabled (since there is no way to have a stable system with swap enabled)

What the heck are you talking about? Swap is enabled on every Linux system I manage (servers, desktop etc) and it's perfectly stable.

link

paulv 329 days ago

The default configuration for tmpfs is to "only" use 50% of physical ram, which still isn't great, but it's something.

link

foresto 329 days ago

To be clear, that 50% (or whatever you configure) is a limit, not a constant.

link