Hacker News new | ask | show | jobs
by jgrahamc 3477 days ago
No, it's an optimization.
3 comments

No, it's not. A non-power-of-two will lead to incorrect behavior, implementing the data structure the way the author describes, for precisely the reason given by your parent comment. There are other implementations that don't suffer from this (described in the comments there), but it's not simply a matter of replacing bitwise-and with a more generic computation of the modulus.

Imagine we had four bit integers and a three cell array. Stepping from 7 (= 1 mod 3) to 8 (= 2 mod 3) winds up stepping instead to 0 (= 0 mod 3) because overflow, which would reuse cells inappropriately.

I never claimed you should use a modulus operation.
You said,

> It's unclear to me why the focus on a 2^n sized buffer just so you can use & for the mask.

In fact it is required for correctness to use the approach he specified with any choice of masking operation.

Any choice?

He only ever calls mask() after an increment. There are other ways of detecting the overflow and wrap to 0 that don't involve modulus.

Yes, any choice. Yes, that's not the case if you change the implementation in other ways in tandem - but then you are describing a different implementation. To mask where he does, any choice of mask will exhibit this problem (or other problems), mathematically. I can produce a proof if you need it.

> He only ever calls mask() after an increment.

I think you misunderstand what he's talking about. He calls mask at insert and lookup and he does not store the masked value - which leaves the implementation vulnerable to overflow (which is not a problem iff the array is 2^n big).

That said, one of the comments on the blog actually had a great suggestion for dealing with this. Wrap the value yourself at 2*capacity, you'll still get the same benefits from the algorithm and it's trivial to prevent the overflow then. You can then avoid the modulo operation (subtract the capacity if it's a larger index) and get better performing non-power of 2 capacities.

That said I'd mostly be using this kind of thing in a situation where i'd want to have a power of two sized buffer anyway.

A ha! That is the point I was missing. Thanks.
Array size of 5 produces a mask of 0100

Write index of 8 = 1000

Masking those together to create a write position 1000 & 0100 = 0

Doesn't work out correctly, got 0, would expect to get 2. In fact, you could never get a write position of 1, 2, or 3 with an array size of 5.

This is not the problem.

You are assuming still using &, which is very obviously incorrect with a very obvious fix (%5) which is still wrong because of behavior at overflow.

I'm answering the specific problem posed by the OP, given his other comments in this thread.

[EDIT] Resolved internal concerns about size calculations.

You might have misread the OP. He's not asking why using & for the mask requires a 2^n sized buffer. He's asking why bother using & when it imposes these additional constraints on us. A part of the answer is that the constraints are already there with this approach, even if you pick a different function for masking than f(i,n) = i & (n - 1) -- which point OP only recently understood due to a misreading of the article.
This is not only an optimisation. Power of 2 is necessary to avoid discontinuity. See the comments below the article for explanation.