Modulus by a power of two is cheap. Modulus by a constant is a multiplication by reciprocal and a shift. And if your argument is in [0..2N], mod N is just a conditional subtraction that doesn't even require a branch.
cheap is relative right? I mean a multiplication can be spread over shift and add/sub instructions whereas a mask is just one instruction I think right?