Hacker News new | ask | show | jobs
by hedora 1468 days ago
Did the Linux kernel ever fix the thing where it evicts code pages at the same priority as files mapped read/write?

I haven't checked in the last 4 years or so, but, before that, every time I've worked with a Linux-based storage system that used mmap to write to files, I've ended up rewriting it to use pread/pwrite.

Each time, there was no perceptible CPU hit, but there was a massive page cache / memory pressure win. It turns out that aggressively evicting warm code pages then faulting them back in is bad for system performance, even with a fast SSD.

2 comments

There's nothing to "fix" here, in some cases what you want is not optimal. It is perfectly reasonable for the kernel to prioritize data pages you touched more recently than code pages by default. It's essentially a big LRU, always has been.

If you don't like that, you can always use mlock(). You can also tune things like writeback sysctls and readahead behavior. But I disagree it's "broken" because it doesn't do what you want by default.

In a post-spectre/meltdown world syscalls are a bit more expensive, you'd be hard-pressed to compete with the journal's mmap windows especially for a warm page cache, using pread/pwrite. Especially if you just went naively about it and tried turning every little object access into its own little island of buffered IO. The objects in the journal are quite small, so you'd likely end up having to implement your own page cache/buffer manager in userspace to coalesce the syscalls.

It'd be far more interesting to explore an io_uring based implementation IMNSHO.