Hacker News new | ask | show | jobs
by ben-schaaf 2601 days ago
This is essentially how databases like PostgreSQL work, but in essence it only avoids the sys-call overhead. The OS is already caching the file, regardless of mmap, so using pread would have likely been enough for us.

It totally would have been simpler overall, but each incremental step we made was significantly less work than the refactoring required for pread.

2 comments

> The OS is already caching the file

Not necessarily. With O_DIRECT, pread() doesn't put pages into page cache: it just DMAs them directly into your process. Using O_DIRECT and the process-private caching we've been discussing, sophisticated programs (like databases) can (and do!) implement their own "page cache" systems. And because databases have access pattern information that the generic kernel VM subsystem doesn't, such a database can frequently do a better job doing this caching on its own.

I might have undersold the performance advantage of writing your own cache, but let me reiterate the point I was trying to make: The reason we didn't consider doing so was because we weren't having a performance issue. Writing our own cache would be strictly more work than just using pread and accomplished the same thing.
Yeah. For your application, you did the right thing. I was speaking more abstractly.
It totally would have been simpler overall, but each incremental step we made was significantly less work than the refactoring required for pread.

Question.

In 10 years will you be saying this about the next incremental problem that you run into? If you think this likely, then the next incremental problem is an excuse to do it right.

If it's less work to solve that problem than refactor all the relating code, and the impact on maintainability is minimal, likely yes. But considering the amount of users we have and the current lack of any crashes relating to mmap there are unlikely to be any future unforseen issues.
Mmap is right, though. Pread would also be right. There's a tradeoff and the complexity argument would only win if they knew all this when they started.