Hacker News new | ask | show | jobs
by jasonhansel 1085 days ago
I've become convinced that there are very few, if any, reasons to MMAP a file on disk. It seems to simplify things in the common case, but in the end it adds a massive amount of unnecessary complexity.
5 comments

It's incredibly useful in read-only, memory constrained scenarios. I.E. we used to mmap all of our animation data on many rendering engines I worked on where having ~20-50mb of animation data and only "paying" a couple 10s of kb based on usage patterns was very handy. It becomes even more powerful when you have multiple processes sharing that data and the kernel is able to re-use clean pages across processes.

From reading the paper most of the concerns are around the write side. LMDB is the primary implementation that I know which leans heavily into mmap but it also comes with a number of constraints there(single writer, read locks can lead to unbounded appending to the WAL, etc). As with any tech choice it's about knowing constraints/trade-offs and making appropriate choices for your domain.

Complexity? You mmap it in and then read the multi terrabyte file as if it was an array.

The opposite with actual file io sucks in terms of complexity. I get that you can write bespoke code that performs better but mmap is a one liner to turn a file into an array.

Need to handle the exceptions/signals every time a disk read fails. With classic IO, you know when the read will happen. But with memory-mapped files, the exception can happen at any time you are reading from the memory range.

As for why disk reads fail, yes that's a thing. Less common on internal storage (bad sectors), but more common on removable USB devices or Network drives (especially on wifi).

Multi-terabyte? Better hope you have lots of spare RAM for all those page structures the kernel has to keep.
"mmap" in the general case is incredibly useful.

There's so much you get "for free" and the UX/DX of reads/writes to it, especially if you're primarily operating on structs instead of raw byte/string data.

(Example, reading a file and "reinterpret_cast<>"'ing it from bytes to in-memory struct representations)

It's just that for the _particular_ case of a DBMS that relies on optimal I/O and transactionality, the general-purpose kernel implementation of mmap falls short of what you can implement by hand.

I've been thinking for the past few years about how to get a scenario like 'git clone' of a large repo to go fast. One thought is to memory map the destination files being written by git and then copy/unzip the data there. You'd save a copy versus the staging buffer that you'd currently be passing to write(). However, the overhead of managing the tlb shootdowns would likely be fatal except for the largest output files.
if you truss starting up a binary, the OS normally mmaps the binary, at least in tests i ran.