|
|
|
|
|
by ayende
289 days ago
|
|
This is wrong, because your mmap code is being stalled for page faults (including soft page faults that you have when the data is in memory, but not mapped to your process). The io_uring code looks like it is doing all the fetch work in the background (with 6 threads), then just handing the completed buffers to the counter. Do the same with 6 threads that would first read the first byte on each page and then hand that page section to the counter, you'll find similar performance. And you can use both madvice / huge pages to control the mmap behavior |
|
Even if you had a million SSDs and somehow were able to connect them to a single machine somehow, you would not outperform memory, because the data needs to be read into memory first, and can only then be processed by the CPU.
Basic `perf stat` and minor/major faults should be a first-line diagnostic.