|
|
|
|
|
by ryanworl
2714 days ago
|
|
This technique is just more IO parallelism at the physical layer due to higher concurrency while submitting IO, correct? Since NVMe and new SSDs don't hit peak throughput until very high queue depths this doesn't surprise me. |
|
A fast storage engine needs to eliminate most of the elements that will stall an execution pipeline. This means doing things like almost completely eliminating shared data structures and context switching. It also means designing your own execution and I/O scheduler to greatly reduce the various forms of stalling on memory ubiquitous in many designs. It is difficult to overstate the extent to which thoughtful schedule design can greatly improve throughput.
A state-of-the-art storage engine can drive 2+ GB/s per core, and schedule things to keep the storage hardware performance close to theoretical while smoothing out transients. It is very easy to run out of storage bandwidth in my experience.