| HN Mirror

Let's compare Redis's AOF mode with InnoDB. The way that InnoDB manages to give this guarantee is by flushing its log to disk on every transaction commit. If you're willing to sacrifice some durability for write speed, this can be relaxed. In redis, the closest equivalent to this would be running in AOF mode with a flush on every write.

The difference here is not one of durability, but in how the data is stored on disk. InnoDB keeps the logs small by periodically updating a B-tree with the changes in the logs, after which those changes can safely be removed from the logs. The result of this is strong durability, a reasonably compact on-disk representation, and fairly fast recovery when someone trips over the power cord.

Redis, in AOF mode, logs every command to the log file and (if you specify it in the config file) flushes to disk after every write. The problem is that this file grows without bound: if you leave redis running forever, it will eventually fill up your hard drive, and recovering from a restart will take way too damn long if you have to replay a 1 TB log file. The conventional way of dealing with this is to periodically use the BGWRITEAOF command, which does essentially the same thing as a background data dump: it writes out a new AOF file from the current contents of redis in memory, and deletes the old AOF file. This is roughly equivalent to augmenting the usual periodic-data-dump behavior of redis with periodically-flushed logs, just like a more conventional database.

If there's something I'm missing here, I'd love to hear it.