Hacker News new | ask | show | jobs
by quotemstr 2611 days ago
Sure, but you still end up with double-caching and frequent entry in the kernel. madvise will never give you as much control and flexibility as just getting the system out of the way and doing the work yourself. (DB systems can also use fun tricks like compressing their cached pages.)
1 comments

Why would mmap lead to double caching? I can't follow.
It's not that mmap per se leads to double caching, but that combining the page cache with application-level caching leads to double caching. Say you're reading hugecactus.png into your image processing program. Whether you use mmap(2) or ordinary read(2), the first step in reading hugecactus.png is the kernel DMAing the bytes into the page cache. In the mmap case, the kernel maps the page cache into your application's address space. In the read case, the kernel copies from the page cache to the application read buffer. Now suppose your application PNG-decodes hugecactus.png into RGB raster data. Now, whether you used mmap or read, the kernel has both the decoded RGB data blob and the original PNG data in memory. That's usually wasteful.

(Yes, you can reduce the severity of this problem with MADV_DONTNEED and friends.)

Well, the kernel side cache is not much of a problem. The kernel is free to evict those pages at any time to respond to memory pressure etc. Linux treats its file system cache almost like unused memory in that it is normally the biggest pool from wich memory allocations for processes are drawn. Essentially, keeping the pages around in the cache is an optimization, because explicitly overwriting them too aggressively is just unnecessary work.