Hacker News new | ask | show | jobs
by eatonphil 1127 days ago
And I guess the idea is furthermore that just because you don't fsync after every write does not mean that you haven't fsync-ed before responding to the user request saying the data is stored durably. I assume that you do actually guarantee the flush before returning a success to the user.
2 comments

Yes, this is exactly how it works. It effectively batches the sync part of multiple user requests, so none of those user requests are ack-ed until the sync completes. It is a low-latency analogue to checkpointing storage write-backs to minimize the number of round-trips to the I/O syscalls.
exactly. you improve latency and throughput at the same time. is kinda cool. by delaying the reponses just a small bit you get huge benefits
Wrong. Use something like aio or io_uring to submit 4 asynchronous writes in a single system call and you'll get way better performance. The kernel has all kinds of infrastructure that tries to coalesce and batch things that the write()+fsync() syscalls make horribly inefficient, and in the modern world of really fast nvme drives, you want to make your calls into the device driver as efficient as possible. You'll burn far fewer CPU cycles by giving the kernel the whole set of i/os in one single go, burn less on synchronization. It really is better to avoid write() + fsync().
what makes you think that we haven't tested this? seastar's io engine is defaulted to io-uring... there are about 10 things here to comment on. on optimized kernels we disable block coalescing at the kernel level, second we tell the kernel to use fifo, etc. these low hanging fruit was already something we've done for a very very long time.