|
|
|
|
|
by dfox
2363 days ago
|
|
From the architecture PoV it is exactly same thing if you think of it as implemented by LL/SC pair. This is how this is presented in most of literature and university courses. Then there is the purely practical issue of x86 not exposing LL/SC and instead having somewhat strict memory model and various lock-prefixed high-level instructions with wildly varying performance characteristics. |
|
I'd be interested to see benchmarks showing that spinlocks with CAS have similar throughput to a ringbuffer with atomic increment.
Note that with the ringbuffer approach, each reader can process many slots at once, since you're taking the minimum of all published slot numbers. If you last processed slot 3, and the minimum publish count is 9, you can process slot 4 through 9 without doing any atomic operations at all. The design guarantees safety with no extra work.
It's not a minor trick; it's one of the main reasons throughput is orders of magnitude higher.
Benchmarks: https://github.com/LMAX-Exchange/disruptor/wiki/Performance-...
Beyond that, the ringbuffer approach also solves the problem you'll always run into: if you use a queue, you have to decide the max size of the queue. It's very hard to use all available resources without allocating too much and swapping to disk, which murders performance. Real systems with queues have memory usage patterns that tend to fluctuate by tens of GB, whereas a fixed-size ringbuffer avoids the problem.