Hacker News new | ask | show | jobs
by bertr4nd 2066 days ago
First it's important to note that `n` is unsigned; if it's signed the value of `-n % n` is 0, intuitively.

For unsigned n, the value is: MAX - n + 1 (where max is the maximum representable value in the type of n, e.g., UINT_MAX). The article explains this nicely. (I thought of 2's complement when reasoning through this, but you don't actually need to assume 2's complement to follow the reasoning).

So, `-n % n` computes `(MAX - n + 1) % n` efficiently, without needing to worry about corner cases.

I suspect this is useful when you want to generate random numbers with a limited range, where the range doesn't cleanly divide UINT_MAX. You need to cut a bit off the top from your underlying random number generator.

1 comments

-n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.

It's calculating ((-n) mod (2^bits)) mod n, where bits is compiler/platform-dependent.

  ;; TXR Lisp
  1> (defun -n%n (n bits)
       (mod (mod (- n) (expt 2 bits)) n))
  -n%n
  2> (-n%n 7 8)
  4
  3> (-n%n 7 16)
  2
  4> (-n%n 7 17)
  4
  5> (-n%n 7 18)
  1
  6> (-n%n 7 32)
  4
  7> (-n%n 7 64)
  2
I would not use this, except possibly with a precise-width type like uint32_t.
> -n % n (where n is unsigned) suffers from the problem that -n calculates a two's complement. That value is implementation-defined, due to the implementation-defined width of the unsigned type.

In C a "computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." (C11 6.2.5p9) In other words, it behaves as-if it were two's complement. So the "problem" of assuming two's complement isn't actually a problem at all. (On machines that use signed-magnitude representation for integer types, for example, such as some only recently retired Unisys mainframes, C compilers actually emulate unsigned arithmetic in the generated code.)

While the number of value and padding bits of integer types (other than the fixed-width types) is implemented-defined, that dilemma is precisely what the construct -n % n is intended to deal with, and to do so in a type-safe manner (i.e. no risk of somebody changing the underlying types without changing UFOO_MAX to UBAR_MAX, presuming the macros even exist).

Your alternative construct just assumes "bits" out of thin air, but because the number of padding bits is implementation-defined in C you can only deduce this from UFOO_MAX[1]. In the next C version there should be new macros that specify the number of value bits, but there's still the type-safety problem of the compiler unable to enforce the constraint that a macro (or any other parameter, for that matter) represents a characteristic of the type your primary expression actually uses. This can be problematic even in languages where you can directly query properties of a type.

-n % n is well suited to its purpose. Elegant, even. And not just in the context of C.

[1] EDIT: Or, alternatively, (unsigned type)-1.

Woah, there's even a scary gotcha for types that aren't unsigned. E.g., check out what happens with "unsigned short": https://gcc.godbolt.org/z/4xWo1G. It looks like -n is implicitly converted (to signed int, maybe?) so unless you explicitly re-cast it to "unsigned short", you get something unexpected.
If n is unsigned short then -n means n is first promoted to either int, or unsigned int: the first of those two types which holds all values of unsigned short. Then the - operation is taking place on the resulting value in that promoted type.

An example where unsigned short promotes to unsigned int are platforms where sizeof(short) == sizeof(int). E.g. 16 bit systems where short and int is 16 bits, and long is 32: compilers for 8086 and such.

If the promotion goes to int, the subsequent - is safe; there is no way the resulting value can be that problematic most negative int of two's complement.

On two's complement systems, even if the - is signed, if you convert the value back to unsigned short, you will get the two's complement value, as if the calculation was done in the unsigned type all along.

For instance a 16 bit unsigned short value of 65535 (0xFFFF) will go to 65535 of type int, which will go to -65535, which will then map to 1 if converted to unsigned short.

Right - I mean, I’m not confused about what’s going on, now that I’ve seen the result. But I do think it is surprising that `unsigned int` and `unsigned short` behave differently in this expression, even if I basically see the logic in the standard.
Well, it computes "(MAX - n + 1) % n", where MAX obviously depends on the type.