|
|
|
|
|
by jcampbell1
4293 days ago
|
|
If all three machines had the same logging code, and one machine was fsync'ing slowly, isn't turning off fsync just a bandaid that hides the true problem? When they discover the actual problem is some issue with the raid controller, I promise not to say "I told you so". |
|
Unrelated write activity on a filesystem can cause cause fsync() calls in any other process to vary wildly in latency. This can be replicated, here's an experiment for you. First, run this:
strace -T -efsync ruby -e'loop { STDOUT.fsync; puts "a" * 120; sleep 0.1 } ' > ~/somefile
Then, in another terminal do a little bit of writing -- make sure it is on the same filesystem. For example:
dd if=/dev/zero of=~/someotherfile bs=4M count=1
On my poor little aws VM, here is what I see:
fsync(1) = 0 <0.025072>
fsync(1) = 0 <3.930661>
fsync(1) = 0 <0.024810>
That is, writing 4 megabytes in an unrelated process caused fsync() to jump two orders of magnitude.
Removing fsync() is an appropriate fix because we don't really ever want to flush this data to durable storage.