Hacker News new | ask | show | jobs
by makmanalp 5185 days ago
Because most data is critical and you can't afford to just drop it on the ground whenever you please. A better option would be to have the application/db to have its own swap routines optimized for its own purposes rather than letting the OS doing a catch-all swap method.
4 comments

Even beyond that: it's desirable for a machine to be able to compute arbitrarily large data sets. If the data set can't efficiently fit in memory, the machine should still make progress, just more slowly, using disk.

It is not desirable for a machine to have a "wall" which, upon being hit, becomes a harsh restriction on its capabilities. This is because we often encounter the "wall" unexpectedly, at a time that might be critical.

But they still have a wall - it's just takes a bit more to hit it. In Linux systems swap is usually 2x memory. With swap set like that, all swap does is raise the wall to 3 times what it previously was.

But for a lot of systems your service will fail shortly after you start swapping anyway, because the performance cost of swapping is so high that it often starts a death spiral (can't handle enough requests, so they start piling up, eating even more memory, until your system dies or you hit connection limits etc.).

So "best case" in a typical configuration is that the wall is a bit higher. Worst case you gain nothing at all from the swap.

Personally I treat it as a failure if we ever hit swap - it means connection limits etc. has been set too high.

"Personally I treat it as a failure if we ever hit swap"

/agree. but still a useful feature.

The degraded performance a system will show when it starts hitting disk instead of memory is a great 'soft' failure.

I think it is good to have graduations. Going from 'OK' to 'Damn-this-is-slow' before 'Fail' is handy.

That's not why you need swap. Swap is because many applications will use memory when starting and then never touch it again.

You can therefor swap it out and use the extra memory for cache.

Most long term applications only need a small fraction of their startup memory.

Then surely they could free it?
If you allocate memory "after" that memory, it's not possible to return the earlier memory to the OS.

Also, suppose you need the memory only for startup and shutdown (things like logfiles, network connections, command line parsing, etc).

Yes it is. Memory allocators are heaps not queues.

Things like network connections, logfiles are used all the time, so they won't be swapped out (actually file handles are kernel side so never swapped anyway). You can free the command line parse after setting the options.

And clean shutdown is overrated: long running programs can just terminate fairly gracelessly if necessary, the OS cleans everything up.

I like the erlang technique of just dying when something bad happens. That is, if you write your app right then you can afford to just die at any point (without dropping any data on the floor at all). I agree with the grandparent--I've started to disable swap on my production servers because I reason that if I run out of ram then I haven't I configured something correctly. A real server shouldn't ever swap--heavy swapping grinds the whole world to a halt which means requests are being serviced way too slowly, if at all...
An application that is aware of the half dozen of so caching layers from register to platter can perform dramatically better than a naive program. Two wrinkles:

1) it needs to either be told the various sizes, speeds, and quirks on each server to make best use. (just some work)

2) it needs to coordinate with the other processes running on the system to divide up the resources. This is hard. Generally people bail and just assign some share of RAM and hope for the best with the other layers.

I agree of course, but I think the current situation is worse. Applications can allocate more than the amount of physical memory, at which point the system can become unusable. If on the other hand an allocation would have been refused at an earlier stage, there would have been no critical data to drop.

I guess I want to argue that with the currently typical amounts of RAM, all critical data should fit in RAM and stay there. The idea of virtual memory was to abstract over the difference between RAM and disk, but perhaps this has become a harmful abstraction now that RAM is big enough while the disadvantage of slow disks remains. RAM and disks are fundamentally different parts of the memory hierarchy, and should be treated completely differently by applications.

There's also copy-on-write to consider. If you actually had to allocate real memory-backed pages for every allocation a process made, had the right to, but never modified, the size of each process would be a lot larger. For example, if a process consumes 1GB of memory and forks a child, the child has access to the same 1GB of memory, but it doesn't really consume 1GB of memory. It has mapped a bunch of copy-on-write pages from its parent, and when either of them next modify those pages, the real memory is allocated.

If they never modify those CoW pages, they can both happily keep using the same copy of the page in memory, and you can have two 1GB processes using a total of e.g. 1.01GB of real memory.

This is very useful in practice, but it means that the system needs the ability to over-commit memory allocations (allow CoW allocations etc., when there is no actual memory available to back it), and over-commit currently, and probably should, requires swap (some place to dump pages in case an over-committed allocation comes calling).

Overcommit does not use or require swap. You can have overcommit enabled on linux and not be using swap at all.
But then you may meet the OOM killer.
Again, that has nothing to do with swap being enabled.
My understanding is that swap would delay or prevent the OOM killer from kicking in. Is that wrong?