| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by colordrops 1466 days ago
	It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that. NFS is more idiomatic on *nix but still a huge pain when dealing with matching file perms across systems.

3 comments

Athas 1466 days ago

> It at least doesn't lock anything up that has a file open when the network goes down.

I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang). That some other computer is down should never prevent me from doing, closing, or killing anything on my computer. Make the system call return an error immediately! Remove the process from the process table! Do anything! I can power cycle the computer to get out of it, so clearly a hanging NFS server is not some kind of black hole in our universe from which no escape is possible.

link

voxadam 1466 days ago

> I must admit I feel quite a bit of irrational fury when this happens (similarly, when DNS lookups hang).

Neither of those reactions are in anyway irrational. In fact, they're not only perfectly reasonable and understandable but felt by a great many of us here on HN.

link

js2 1466 days ago

This is not the fault of NFS. The same thing would happen if a local filesystem suddenly went missing. The kernel treats NFS mounts as just another filesystem. You can in fact mount shares as soft or interruptible if you want.

https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storag...

link

dikei 1466 days ago

Soft mount can lead to data inconsistency, so it's not always a good choice.

link

dikei 1466 days ago

> It at least doesn't lock anything up that has a file open when the network goes down. NFS is a nightmare with that.

Yeah, we've been bitten by this too, around once a year, even with our fairly reliable and redundant network. It's a PITA, your process just hang and there's no way to even kill it except restarting the server.

link

toast0 1466 days ago

> It's a PITA, your process just hang and there's no way to even kill it except restarting the server.

If you can bring the missing server back online, the NFS mount should recover.

link

trasz 1466 days ago

This sounds like a Linux client bug (failure to properly implement the “intr” mount option), not the fault of NFS itself.

link

grosswait 1466 days ago

It’s a failure to use the intr mount option. I’ve never had a problem using soft mounts either, which make the described problem non existent

link

jabl 1466 days ago

intr/nointr are no-ops in Linux. From the nfs(5) manpage (https://www.man7.org/linux/man-pages/man5/nfs.5.html ):

> intr / nointr This option is provided for backward compatibility. It is ignored after kernel 2.6.25.

(IIRC when that change went in there was also some related changes to more reliably make processes blocked on a hung mount SIGKILL'able)

link

smarks 1466 days ago

This is too bad. The sweet spot was "hard,intr" at least when I was last using NFS on a daily basis (mid 1990s). Hard mounts make sense for programs, which will happily wait indefinitely while blocked in I/O. This worked well for things like doing a build over NFS, which would hang if the server crashed and then pick right up right where it left off when the server rebooted.

Of course this is irritating if you're blocked waiting for something incidental, like your shell doing a search of PATH. In those cases you could just control-C and continue doing what you wanted to do (as long as it didn't actually need that NFS server).

However I can see that it would be difficult to implement interruptibility in various layers of the kernel.

link

jabl 1465 days ago

I think the current implementation comes reasonably close to the old "intr" behavior.

AFAICT the problem with "intr" wasn't that the kernel parts were impossible to implement in the kernel, but rather an application correctness issue, as few applications are prepared to handle EINTR in any I/O syscall. However, with "nointr" the process would be blocked in uninterruptible sleep and would be impossible to kill.

However, if the process is about to be killed by the signal, then not handling EINTR is irrelevant. Thus in 2.6.25 a new process state TASK_KILLABLE was introduced (https://lwn.net/Articles/288056/ ), which is a bit like TASK_UNINTERRUPTIBLE except the task can be interrupted by a fatal signal, and the NFS client code was converted to use it in https://lkml.org/lkml/2007/12/6/329 . So the end result is that the process can be killed with Ctrl-C (as long as it hasn't installed a non-default SIGTERM handler), but doesn't need to handle EINTR for all I/O syscalls.

link