Hacker News new | ask | show | jobs
by swillden 6307 days ago
Another option: use mmap() and rewrite it in place, using ftruncate() if necessary to grow/shrink the file, and msync() to flush the changes.

This is also risky because it could leave the file in an inconsistent, partially-modified state. However, in practice it seems to work really well, and Google turns up a striking lack of articles discussing the risks of inconsistency due to use of mmap.

I'd like to learn more about the risks of this approach, since I'm using it to selectively rewrite portions of a very important file in a project of mine. I'm pretty sure that my usage pattern is safe. The writes are small enough that they nearly always affect only one disk block and never touch more than two, and I'm careful to msync() at the right places (actually mmap.mmap.flush(), since I'm using Python), but I'm interested in learning about potential issues. This file is too large to make copying it convenient, but I'll do that if necessary.

2 comments

"Another option: use mmap() and rewrite it in place, using ftruncate() if necessary to grow/shrink the file, and msync() to flush the changes."

This is a fine option, but for the purpose of this bug I don't think there is any advantage over using stdio (with seeks if necessary) and calling fsync() after writing. It's a great technique, but functionally equivalent to just saying "avoid O_TRUNCATE".

Unless you have some nifty way of upgrading from a copy-on-write map(MAP_PRIVATE) to something that gets atomically written to disk?

might using a berkdb (as suggested) make more sense for your case? it can be devilish tricky getting these things to work right.
Possibly. The ability to view/tweak the file in a text editor is something I find very convenient, but it's less important than consistency.
You can always write a wrapper to start a transaction, extract the file from the database, exec $EDITOR on it, replace the database data with the file, and commit the transaction. That is very safe, and very easy to do. (And, you can make the txn apply to multiple files, which is quite useful.)

We use this technique in the command line utility for KiokuDB.