|
|
|
|
|
by jasonwatkinspdx
2104 days ago
|
|
> The filesystem does provide guarantees when you use fsync() and fdatasync(). Postgres relies on these guarantees. So does RocksDB. Pebble's usage of fsync/fdatasync mirrors RocksDB's. Our crash testing is not testing the filesystem guarantees, only that we're correctly using fsync/fdatasync (which is hard enough to get right). For anyone unfamiliar, fsync/fdatasync are infamous for all sorts of subtle sharp edges: https://www.usenix.org/conference/atc20/presentation/rebello Having synchronous replication via paxos/raft can mitigate a lot of this. |
|
From the Rebello paper: > However, on restart, since the log entry is in the page cache, LevelDB includes it while creating an SSTable from the log file.
Pebble and RocksDB both inherited this behavior. The nuance here is that the sstable is then synced to disk and no reads are served until the sync is successful. If the machine were to crash before the sstable was synced, upon restart we'd rollback to the durable prefix of the log.