|
|
|
|
|
by kragen
231 days ago
|
|
If you want accessing a particular page to cause a SIGSEGV so your custom fault handler gets invoked, you can just munmap it, converting that access from a "non-failing page fault" into one "deemed to be invalid". Then the mechanism I described would "allow[] concurrency of page faults, so the [userspace threading library] is able to perform concurrent reads against the underlying storage media". As long as you were aggressive enough about unmapping pages that none of your still-mapped pages got swapped out by the kernel. (Or you could use mlock(), maybe.) I tried implementing your "hint_read" years ago in userspace in a search engine I wrote, by having a "readahead thread" read from pages before the main thread got to them. It made it slower, and I didn't know enough about the kernel to figure out why. I think I could probably make it work now, and Linux's mmap implementation has improved enormously since then, so maybe it would just work right away. |
|
Presumably having fine-grained mmaps will be another source of overhead. Not to mention that each mmap requires another system call. Instead of a single fault or a single call to `readv`, you're doing many `mmap` calls.
> I tried implementing your "hint_read" years ago in userspace in a search engine I wrote, by having a "readahead thread" read from pages before the main thread got to them.
Yeah, doing it in another thread will also have quite a bit of overhead. You need some sort of synchronisation with the other thread, and ultimately the "readahead" thread will need to induce the disk reads through something other than a page fault to achieve concurrent reads, since within the readahead thread, the page faults are still synchronous, and they don't know what the future page faults will be.
It might help to do `readv` into dummy buffers to force the kernel to load the pages from disk to memory, so the subsequent page faults are minor instead of major. You're still not reducing the number of page faults though, and the total number of mode switches is increased.
Anyway, all of these workarounds are very complicated and will certainly be a lot more overhead than vectored IO, so I would recommend just doing that. The overall point is that using mmap isn't friendly to concurrent reads from disk like io_uring or `readv` is.
Major page faults are basically the same as synchronous read calls, but Golang read calls are asynchronous, so the OS thread can continue doing computation from other Goroutines.
Fundamentally, the benchmarks in this repository are broken because in the mmap case they never read any of the data [0], so there are basically no page faults anyway. With a well-written program, there shouldn't be a reason that mmap would be faster than IO, and vectored IO can obviously be faster in various cases.
[0] Eg, see here where the byte slice is assigned to `_` instead of being used: https://github.com/perbu/mmaps-in-go/blob/7e24f1542f28ef172b...