One of the benefits of the original algorithm is the independence of the read and write indexes, they can be updated from different threads (or different processors!) without any atomic operations beyond writing or reading a value. Subtracting from both pointers requires an additional atomic read/modify/write operation.
You could also just restrict the pointers in the normal way but to two times the size of the buffer. So instead of wrapping at N you wrap at 2*N.
You are only encoding 1 bit of data (first or second) so adding more data than that by allowing unsigned integer overflow is just an optimization, not fundamentally necessary.
If you do that then the size() function becomes a problem. The original implementation relies on unsigned integer wrap-around to give the proper result when write < read.
It all depends on your use case. In my experience, most of the times when dealing with RB:s, it's more important to guarantee that the read index and the write index can be updated lock-free by different threads (e.g. one producer and one consumer) than to have a very specific non-power-of-2 capacity.
No, it's not. A non-power-of-two will lead to incorrect behavior, implementing the data structure the way the author describes, for precisely the reason given by your parent comment. There are other implementations that don't suffer from this (described in the comments there), but it's not simply a matter of replacing bitwise-and with a more generic computation of the modulus.
Imagine we had four bit integers and a three cell array. Stepping from 7 (= 1 mod 3) to 8 (= 2 mod 3) winds up stepping instead to 0 (= 0 mod 3) because overflow, which would reuse cells inappropriately.
Yes, any choice. Yes, that's not the case if you change the implementation in other ways in tandem - but then you are describing a different implementation. To mask where he does, any choice of mask will exhibit this problem (or other problems), mathematically. I can produce a proof if you need it.
> He only ever calls mask() after an increment.
I think you misunderstand what he's talking about. He calls mask at insert and lookup and he does not store the masked value - which leaves the implementation vulnerable to overflow (which is not a problem iff the array is 2^n big).
You might have misread the OP. He's not asking why using & for the mask requires a 2^n sized buffer. He's asking why bother using & when it imposes these additional constraints on us. A part of the answer is that the constraints are already there with this approach, even if you pick a different function for masking than f(i,n) = i & (n - 1) -- which point OP only recently understood due to a misreading of the article.
- subtracting buffer size from both pointers once the read pointer has wrapped.
- choosing a longer int for the math operation where possible
That seems a small price for the freedom to be able to choose an appropriate buffer size.