Hacker News new | ask | show | jobs
by guard-of-terra 5180 days ago
One behavior I've noticed with linux that if you read files sequentially from disk (for example, doing scp), then linux would fill all the memory with those files' contents and then it would swap out everything but the (obviously useless) disk caches. So you'll have all the memory filled with data you would never need again and trying to do anything would cause a large and painful unswapping (had side effect of halting my qemu).

This is true insanity. Surely you can disable swap or tune swappiness, but what's the reason for crazy default behavior?

5 comments

Use rsync. It now preserves the buffer cache status that files had before so it does not stomp on your allocations.

http://insights.oetiker.ch/linux/fadvise/

Also, consider using --bwlimit to throttle the copy speed, so the spindle can still respond to other IO requests. (25-50% of unthrottled speed seemed to be a reasonable tradeoff).
I suspect the reason to be that the system has noticed that your apps haven't touched their memory for a long time and thus "don't need it". This is a reasonably valid assumption on servers: if you've got some daemons that are backgrounded for minutes or hours at a time, keeping their memory resident is a waste. However, on the desktop responsiveness (latency) is more important than throughput. Just because you only switch between apps on a timescale of minutes or hours, doesn't mean the kernel should swap them out. So the algorithm needs different weighting.

I used to experience this problem, but I haven't lately. I suspect what's going on is that the "-desktop" kernel variant of OpenSUSE uses a differently weighted swappiness algorithm. If your distro offers a choice of different kernel variants, you could try them; otherwise (or if that doesn't help), you could track down the knobs you need to tweak to make the problem go away.

I guess that the kernel cannot know that the application (in your case scp) will not try to touch the data ever again.

The current default behavior, which you call crazy, does however favor programs which do actually need the data that was just read (e.g., databases).

Would mmapping those files instead of reading them provide saner behavior, or does the kernel still do that?
Reading a file with mmap() results in the same behavior.
I'm guessing you're either running a rather old distribution/kernel, or your swappiness is set far too aggressively. Try setting it to 0.