Hacker News new | ask | show | jobs
by jandrewrogers 1085 days ago
I think the real argument is more nuanced. Where you see mmap() fail badly on Linux, even for read-only workloads, is under a few specific conditions: very large storage volumes, highly concurrent access, non-trivial access patterns (e.g. high-dimensionality access methods). Most people do not operate data models under these conditions, but if you do then you can achieve large integer factor gains in throughput by not using mmap().

Interestingly, most of the reason for these problems has to do with theoretical limitations of cache replacement algorithms as drivers of I/O scheduling. There are alternative approaches to scheduling I/O that work much better in these cases but mmap() can’t express them, so in those cases bypassing mmap() offers large gains.

3 comments

GP wrote a key-value store called LMDB that is constrained to a single writer, and often used for small databases that fit entirely in memory but need to persist to disk. There's a whole different world for more scalable databases.
"fit entirely in memory" is not a requirement. LMDB is not a main-memory database, it is an on-disk database that uses memory mapping.
Can you explain "high-dimensionality access methods" to me? (Or if it's too big for an HN comment, maybe recommend a paper).
This guy talks a lot of crap. See his website for examples, and don't waste your time with him

<<<There is one significant drawback that should not be understated. Algorithm design using topology manipulation can be enormously challenging to reason about. You are often taking a conceptually simple algorithm, like a nested loop or hash join, and replacing it with a much more efficient algorithm involving the non-trivial manipulation of complex high-dimensionality constraint spaces that effect the same result. Routinely reasoning about complex object relationships in greater than three dimensions, and constructing correct parallel algorithms that exploit them, becomes easier but never easy.>>>

http://www.jandrewrogers.com/2015/10/08/spacecurve/

I'd imagine same kind of worst case access would also be a problem doing IO the "classical" way