The disk controller may decide to write out blocks in a different order than the logical layout in the log file itself, and be interrupted before completing this work.
Just wondering how SQLite would ever work if it had zero control over this. Surely there must be some "flush" operation that guarantees that everthing so far is written to disk? Otherwise, any "old" block that contains data might have not been written. SQLite says:
> Local devices also have a characteristic which is critical for enabling database management software to be designed to ensure ACID behavior: When all process writes to the device have completed, (when POSIX fsync() or Windows FlushFileBuffers() calls return), the filesystem then either has stored the "written" data or will do so before storing any subsequently written data.
A "flush" command does indeed exist... but disk and controller vendors are like patients in Dr. House [1] - everybody lies. Especially if there are benchmarks to be "optimized". Other people here have written up that better than I ever could [2].
It’s worth noting this is also dependent on filesystem behavior; most that do copy-on-write will not suffer from this issue regardless of drive behavior, even if they don’t do their own checksumming.
NVMe drives do their own manipulation of the datastream. Wear leveling, GC, trying to avoid rewriting an entire block for your 1 bit change, etc. NVMe drives have CPUs and RAM for this purpose; they are full computers with a little bit of flash memory attached. And no, of course they're not open source even though they have full access to your system.
> Local devices also have a characteristic which is critical for enabling database management software to be designed to ensure ACID behavior: When all process writes to the device have completed, (when POSIX fsync() or Windows FlushFileBuffers() calls return), the filesystem then either has stored the "written" data or will do so before storing any subsequently written data.