|
|
|
|
|
by mrlongroots
409 days ago
|
|
Some examples off the top of my head: - You can reason about block offsets. If your writes are 512B-aligned, you can be ensured minimal write amplification. - If your writes are append-only, log-structured, that makes SSD compaction a lot more straightforward - No caching guarantees by default. Again, even SSDs cache writes. Block writes are not atomic even with SSDs. The only way to guarantee atomicity is via write-ahead logs. - The NVMe layer exposes async submission/completion queues, to control the io_depth the device is subjected to, which is essential to get max perf from modern NVMe SSDs. Although you need to use the right interface to leverage it (libaio/io_uring/SPDK). |
|
Not all devices use 512 byte sectors, an that is mostly a relic from low-density spinning rust;
> If your writes are append-only, log-structured, that makes SSD compaction a lot more straightforward
Hum, no. Your volume may be a sparse file on SAN system; in fact that is often the case in cloud environments; also, most cached RAID controllers may have different behaviours on this - unless you know exactly what your targeting, you're shooting blind.
> No caching guarantees by default. Again, even SSDs cache writes. Block writes are not atomic even with SSDs. The only way to guarantee atomicity is via write-ahead logs.
Not even that way. Most server-grade controllers (with battery) will ack an fsync immediately, even if the data is not on disk yet.
> The NVMe layer exposes async submission/completion queues, to control the io_depth the device is subjected to, which is essential to get max perf from modern NVMe SSDs.
Thats storage domain, not application domain. In most cloud systems, you have the choice of using direct attached storage (usually with a proper controller, so what is exposed is actually the controller features, not the individual nvme queue), or SAN storage - a sparse file on a filesystem on a system that is at the end of a tcp endpoint. One of those provides easy backups, redundancy, high availability and snapshots, and the other one you roll your own.