| > > you lose the durability guarantee that makes databases useful. ... the data might still be sitting in kernel buffers, not yet written to stable storage. > No! That's because you stopped using fsync. It's nothing to do with your code being async. From that section, it sounds like OP was tossing data into the io_uring submition queue and calling it "done" at that point (ie: not waiting for the io_uring completion queue to have the completion indicated). So yes, fsync is needed, but they weren't even waiting for the kernel to start the write before indicating success. I think to some extent things have been confused because io_uring has a completion concept, but OP also has a separate completion concept in their dual wal design (where the second WAL they call the "completion" WAL). But I'm not sure if OP really took away the right understanding from their issues with ignoring io_uring completions, as they then create a 5 step procedure that adds one check for an io_uring completion, but still omits another. > 1. Write intent record (async) > 2. Perform operation in memory > 3. Write completion record (async) > 4. Wait for the completion record to be written to the WAL > 5. Return success to client Note the lack of waiting for the io_uring completion of the intent record (and yes, there's still not any reference to fsync or alternates, which is also wrong). There is no ordering guarantee between independent io_urings (OP states they're using separate io_uring instances for each WAL), and even in the same io_uring there is limited ordering around completions (IOSQE_IO_LINK exists, but doesn't allow traversing submission boundaries, so won't work here because OP submits the work a separate times. They'd need to use IOSQE_IO_DRAIN which seems like it would effectively serialize their writes. which is why It seems like OP would need to actually wait for completion of the intent write). |