Hacker News new | ask | show | jobs
by wtallis 1989 days ago
Yep, mmap is really bad for performance on modern hardware because you can only fault on one page at a time (per thread), but SSDs require a high queue depth to deliver the advertised throughput. And you can't overcome that limitation by using more threads, because then you spend all your time on context switches. Hence, io_uring.
3 comments

Can't you just use MAP_POPULATE which asks the system to populate the entire mapped address range, which is kind of like page-faulting on every page simultaneously?
That usually works if you have sufficient RAM, and do plan to touch substantially all of the file, and don't have any tight QoS targets to meet around the time you map the file.
If you're reading sequentially this shouldn't be a problem because the VM system can pick up hints, or you can use madvise.

If you're reading randomly this is true and you want some kind of async I/O or multiple read operation.

mmap is also dangerous because there's no good way to return errors if the I/O fails, like if the file is resized or is on an external drive.

Even if you use madvise() for a large sequential read, the kernel will often restrict its behavior to something suboptimal with respect to performance on modern hardware.
If I read with a huge block size, say 100mb. Will the OS request things in a sane way?
Yeah. Linux will end up splitting the requests down to typically 128kB blocks, but they're submitted to the SSD as a batch rather than one at a time, so there's sufficient work to keep the drive properly busy. But only do this if you actually need all 100MB. If you're randomly accessing only bits and pieces of the file, it's usually better to stick with 4kB requests (or larger if your file format and access patterns make that appropriate).