Hacker News new | ask | show | jobs
by lq0000 522 days ago
I collected some more measurements with different implementations of the get function.

1000000 iterations, buffer 8192, message 128

i9-13900K, gcc 13.3, -O3, kernel 6.8.0-49, glibc 2.39

byte-mod: 0.117353 us/iter

byte-and: 0.065379 us/iter

byte-sub: 0.027865 us/iter

1cpy-mod: 0.001143 us/iter

1cpy-and: 0.001100 us/iter

1cpy-sub: 0.001098 us/iter

2cpy-mod: 0.008140 us/iter

2cpy-and: 0.001100 us/iter

2cpy-sub: 0.007711 us/iter

funny-mod: 0.001145 us/iter

funny-and: 0.001101 us/iter

funny-sub: 0.001100 us/iter

where:

`byte` is per-byte copy

`1cpy` is a single-memcpy (assumes message size divides buffer size evenly)

`2cpy` is the split memcpy (which supports other message sizes)

`funny` is single-memcpy with the double-memmapped buffer

and:

`mod` uses `head %= buffer_size`

`and` uses `head &= buffer_size`

`sub` uses `if (head >= buffer_size) head -= buffer_size`

The tl;dr here is that the doubly memmapped buffer performs the same as the 1-memcpy implementation that doesn't do anything funny and significantly better than all 2-memcpy implementations except `and`. Since your page size is gonna be a power of 2 anyway, this means this trick is not really worthwhile and indeed you should just use `and`. But, compared to the other wrapping implementations it does still provide a tangible improvement. That may just come down to how the compiler was able to optimize it, but I don't feel like nitpicking the generated code to figure out why right now.