> If the user applications are going to request huge pages using mmap system
call, then it is required that system administrator mount a file system of
type hugetlbfs::
Note this otherwise has semantics similar to tmpfs; notably, it's usage is mutually exclusive with being able to supply a disk file fd to mmap!
On BSD, read() was already implemented in the kernel by page-faulting in the desired pages of the file, to then be copied into the user-supplied buffer. So from the first time mmap was ever implemented, it was always the fastest input mechanism. (First deployed implementation was in SunOS btw, 4.2BSD specified and documented it but didn't implement it.) Anyway there's no magic to get data off a device into memory faster, io_uring just lets you hide the delay in some other thread's time.
mmap is slow because stalling on page faults is slow. Your process stalls and sits around doing nothing instead of processing data you've read already. You can google the benchmarks if you like. io_uring wasn't built just for kicks.