|
|
|
|
|
by jrockway
649 days ago
|
|
There isn't a bug, it's resource exhaustion. You open a bunch of files and they fail to close. You don't log errors on the close, so you have no idea it's happening. Now your app is failing to open new file descriptors to accept HTTP connections. You get a fixed number of fds per app; ulimit -n. If you don't close files you've read, the descriptor is gone. The bug in this case is in the filesystem that hangs on close. It happens on network filesystems. You can't return the fd to the kernel if your filesystem doesn't let you. |
|
The filesystem hanging is unlikely to be a bug. The filesystems you'd realistically use in conjunction with Kubernetes are pretty heavily tested. More likely it is supposed to hang under whatever conditions has lead that to happen.
And, sure, maybe you'll eventually want to determine why the filesystem has moved into that failure state, but most pressing is that your app is crashing. All that work you put into gracefully handling the failing situation going to waste.