Hacker News new | ask | show | jobs
by afr0ck 1083 days ago
Linux throws a SIGBUS. A process should anticipate such I/O failures by implementing a SIGBUS handler, especially a database server.

For the second part of your comment, on Linux systems, there is the msync() system call that can be used to flush the page cache on demand.

1 comments

> msync() system call that can be used to flush the page cache on demand.

for everyone, not just the file you mapped to memory. I.e. the guarantee is that your file will be written, but there's no way to do that w/o affecting others. This is not such a hot idea in an environment where multiple threads / processes are doing I/O.

msync() affects only the pages that part of the mmap area you ask for in the arguments. From the man pages:

> int msync(void addr[.length], size_t length, int flags);

> msync() flushes changes made to the in-core copy of a file that was mapped into memory using mmap(2) back to the filesystem

No it doesn't. That's physically impossible. Read what you quoted -- it never says that it's going to do it only for the file in question.

If you don't know why it's not possible, here's a simplified version of it: hardware protocols (s.a. SCSI) must have fixed size messages to fit them through the pipeline. I.e. you cannot have a message larger than the memory segment used for communication with the device, because that will cause fragmentation and will lead to a possibility of message being corrupted (the "tail" being lost or arriving out of order).

On the other hand, to "flush" a file to persistent storage you'd have to specify all blocks associated with the file that need to be written. If you try to do this, it will create a message of arbitrary size, possibly larger than the memory you can store it in. So, the only way to "flush" all blocks associated with a file is to "flush" everything on a particular disk / disks used by the filesystem. And this is what happens in reality when you do any of the sync family commands. The difference is only in what portion of the not-yet synced data the OS will send to the disk before requesting a sync, but the sync itself is for the entire disk, there aren't any other syncs.

I don't know what you're talking about, but msync() flushes only the pages in that range. The pages are in the page cache (on Linux, it's a per-file xarray [1] of pages). Once all the dirty pages in the range are located, they go through the filesystem to be mapped to block numbers and then submitted to the block layer to be written to the storage device. Only the disk blocks mapped to the pages in that range will be written.

Source: I'm a Linux kernel developer.

[1] https://docs.kernel.org/core-api/xarray.html