| It is not just a way of writing ring buffers. It's a way of implementing concurrent non-blocking single-reader single-writer atomic ring buffers with only atomic load and store (and memory barriers). The author says that non-power-of-two is not possible, but I'm pretty sure it is if you use a conditional instead of integer modulus. I first learnt of this technique from Phil Burk, we've been using it in PortAudio forever. The technique is also widely known in FPGA/hardware circles, see: "Simulation and Synthesis Techniques for Asynchronous
FIFO Design", Clifford E. Cummings, Sunburst Design, Inc. https://twins.ee.nctu.edu.tw/courses/ip_core_04/resource_pdf... |
Intel is still 64 byte cache lines as they have been for quite a long time but they also do some shenanigans on the bus where they try to fetch two lines when you ask for one. So there’s ostensibly some benefit of aligning data particularly on linear scans to 128 byte alignment for cold cache access.