| Hi Antirez, thanks again for Redis. Despite our few problems with it, it rocks. A few comments: > With fsync everysec, there is the problem that form time to time we need to fsync. Guess what? Even if we fsync in a different thread, write(2) will block anyway Yep, but this could be avoided if a thread was devoted to all I/O incl. write() (and then line-level buffering really would be possible as well). Communication with this thread would be on a thread-safe queue--the main thread would never block on disk I/O, and only two threads would mean mutex contention for the queue lock would be low. This would be one solution, correct? This is a variation of your "two processes + pipe" suggestion. > How to fix that? For now we introduced in Redis 2.2 an option that will not fsync the AOF file while writing IF there is a compaction in progress. Well, we enabled that.. but, we found that it's still a problem in a couple of circumstances: 1. Something other than the AOF recompaction makes the disk busy. Like, say, even a moderate amount of disk activity by another process. 2. Redis's own logging to stdout, if redirected to a file, itself can cause the redis main thread to block if stdout is being flushed onto a busy disk. Basically, if any I/O which may hit a disk (AOF record/flush or even logging) is being done on the single epoll-driven thread redis uses to process incoming requests, the system must make very good guarantees that those I/O calls will not block. We have found these guarantees practically impossible to make on a very busy master, so we've given on up having the master do AOF work altogether. |
Exactly the logging process can well be a thread for better performances, thanks for the hint!
About the other scenarios where fsync will perform poorly, indeed every other I/O is going to be a problem.
I guess the "all the AOF business in a different thread" is the most sensible approach to follow probably, unless there is an (even Linux specific syscall) that is able to avoid blocking but just to force commit of old data.