| HN Mirror

mmap is better when:

  * You want your program to crash on any I/O error because you wouldn't handle them anyway
  * You value the programming convenience of being able to treat a file on disk as if the entire thing exists in memory
  * The performance is good enough for your use. As the article showed, sequential scan performance is as good as direct I/O until the page cache fills up *from a single SSD*, and random access performance is as good as direct I/O until the page cache fills up *if you use MADV_RANDOM*. If your data doesn't fit in memory, or is across multiple storage devices, or you don't correctly advise the OS about your access patterns, mmap will probably be much slower

To be clear, normal I/O still benefits from the OS's shared page cache, where files that other processes have loaded will probably still be in memory, avoiding waiting on the storage device. But each normal I/O process incurs the space and time cost of a copy into its private memory, unlike mmap.