|
|
|
|
|
by isotopp
1827 days ago
|
|
In NVME you can get around 800.000 IOPS from a single device, but the latency gives you around 20.000 IOPS sequentially. You need to talk with deep queues or with multiple concurrent threads to the device in order to eat the entire IOPS buffet. Traditional OLTP workloads do not tend to have the concurrency to actually saturate the NVME. You would need to be 40-way parallel, but most OLTP workloads give you 4-way. Multiple instances per device are almost a must. |
|
On top-line gear this isn't an issue, they don't signal a write cache (by virtue of either having a non-volatile cache or enough of a power reserve to flush the cache). Which then prevents the OS from actually doing more expensive for fdatasync()/O_DSYNC. One also can manually ignore the need for caching by changing /sys/block/nvme*/queue/write_cache to say write through, but that obviously looses guarantees - but can be useful to test on lower end devices.