Hacker News new | ask | show | jobs
by StillBored 1950 days ago
NFS is notorious for breaking kernel and application assumptions about posix. Linux falls into this trap in various ways too in an effort to simplify the common cases. Timeouts might be appropriate for read/open/etc calls but in a way the problems are worse on the write/close/etc side.

Reading the close() manpage hints at some of those problems, but fundamentally posix sync file io isn't well suited to handling space and io errors which are deferred from the originating call. Consider write()'s which are buffered by the kernel but can't be completed due to network or out of space consideration. A naive reading of write() would imply that errors should be immediately returned so that the application can know the latest write/record update failed. Yet what really happens is that for performance reasons the data from those calls is allowed to be buffered. Leading to a situation where an IO call may return a failure as a result of failure at some point in the past. Given all the ways this can happen, the application cannot accurately determine what was actually written, if anything, since the last serialization event (which is itself another set of problems).

edit: this also gets into the ugly case about the state of the fd being unspecified (per posix) following close failures. So per posix the correct response is to retry close(), while simultaneously assuring that open()s aren't happening anywhere. Linux simplifies this a bit by implying the FD is closed, but that has its own issues.

1 comments

I understand the reasoning, but at the same time wonder if this isn't perfect being the enemy of good? Since there is no case where a timeout/error style exit can be guaranteed to never lose data we instead lock the entire box up when a NFS server goes AWOL. This still causes the data to be lost, but also brings down everything else.
Well, soft mounts should keep the entire machine from dying, unless your running critical processes off the NFS mount. Reporting/debugging these cases can be fruitful.

OTOH, PXE/HTTPS+NFS root is a valid config, and there isn't really anyway to avoid machine/client death when the NFS goes offline for an extended period. Even without NFS linux has gotten better at dealing with full filesystems, but even that is still hit or miss.