Hacker News new | ask | show | jobs
by nicioan 1972 days ago
Excellent article, thank you! I really like the analysis and profiling part of the evaluation. I also have some experience in I/O performance in linux -- we measured 30GiB/s in a pcie Gen3 box (shameless plug[0]).

I have one question / comment: did you use multiple jobs for the BW (large IO) experiments? If yes, then did you set randrepeat to 0? I'm asking this because fio by default uses the same sequence of offsets for each job, in which case there might be data re-used across jobs. I had verified that with blktrace a few years back, but it might have changed recently.

[0]https://www.usenix.org/conference/fast19/presentation/kourti...

edit: fixed typo

1 comments

Looks interesting! I wonder whether there'd be interesting new database applications on NVMe when doing as small as 512 byte I/Os (with more efficient "IO engine" than Linux bio, that has too high CPU overhead with such small requests).

I mean, currently OLTP RDBMS engines tend to use 4k, 8k (and some) 16k block size and when doing completely random I/O (or, say traversing an index on customer_id that now needs to read random occasional customer orders across years of history). So you may end up reading 1000 x 8 kB blocks just to read 1000 x 100B order records "randomly" scattered across the table from inserts done over the years.

Optane persistent memory can do small, cache line sized I/O I understand, but that's a different topic. When being able to do random 512B I/O on "commodity" NVMe SSDs efficiently, this would open some interesting opportunities for retrieving records that are scattered "randomly" across the disks.

edit: to answer your question, I used 10 separate fio commands with numjobs=3 or 4 for each and randrepeat was set to default.