Hacker News new | ask | show | jobs
by nwellnhof 847 days ago
It should be noted that in C89, both behaviors of division and modulo are allowed and it's implementation-defined whether division results are truncated or rounded towards negative infinity. This has changed in C99 which only allows truncation towards zero.
3 comments

I've always considered it a mistake to refer to "%" as a "modulo" operator in C because of its ability to return negative numbers. That's not how modulo arithmetic works. C's "%" is a remainder operator.
> That's not how modulo arithmetic works.

We should be a bit careful with what one means by "modulo arithmetic" here. If we are talking about arithmetic in Z/nZ (read "integers modulo n"), then the objects being acted upon are no longer integers, but equivalence classes of integers. That is, the set of all integers that satisfy the equivalence relation "~" where "a ~ b" means a - b = k*n for some integer k. For example, one of the equivalence classes in Z/3Z would be the set {..., -4, -1, 2, 5, ...}.

Since every element of the set is equivalent under that relation, we typically will choose a representative of that equivalence class, e.g., [2] for the previous class. If we view the "mod" or "modulo" operator to be a mapping from the integers to a particular representative of its equivalence class, there's no reason that negative values should be excluded. [-1] refers to the same equivalence class as [2], [32], and [-7]. The details of how integers are mapped to a chosen representative seem to vary from language to language, but modular arithmetic works all the same between them.

But 2 mod 3 should still give the same as -1 mod 3. If the representative for the same equivalence class is allowed to be different depending on the input, you could just as well use the number itself.
At the risk of being snarky, I should certainly hope the output depends on the input.

On a more serious note, there are quite reasonable properties that are satisfied by keeping the sign of the first argument as the chosen representative. In particular, for any two nonzero integers a and b, we have that:

a = (a / b) * b + (a mod b)

Where division should be interpreted as integer division that rounds towards zero. If a=-1 and b=3, then (a/b) would be zero which would require (a mod b) to be -1 if we want the above to hold. Also note that other rounding choices (e.g., rounding towards negative infinity) could impose (a mod b) = 2.

So choosing a particular representative comes down to choosing what properties you want your function to have in relation to other arithmetic.

There is no "particular representative"; every equivalence class (except 0) has two possible representatives.

... and that's exactly why the C modulo operator sucks.

Neither of them is wrong. Languages (like python) which only return a positive number are returning the least positive residue of division under the modulus. That is the most common interpretation of it in number theory at least far as I know.

C generally does return negative numbers when you use the % operator.

Under modulo math, -1, 9 and 19, for example, are all the same element of the modulo 10 congruence. 9 is called the "least positive residue", which is sometimes important.
I have to re-learn this each year for Advent of Code, when inevitable my something % anything goes into the negative when searching downwards.
That's not any clearer. Both "modulo" and "remainder" can be defined in ways (yeah, more than one) that include or exclude negative numbers.

You have to memorize the full definition. There's no mnemonic shortcut.

Remainder goes with division, and respects invariants

Modulus doesn’t. I believe that algebraically equality module N can exist without division

Exactly. Modulo is not an operator, but a term for an equivalence class of differing by a multiple of a number fixed in advance.
> Modulo is not an operator,

"operator" is being used as a term of art in programming language design, to describe a symbol used to indicate that an operation should be performed on one or more operands.

https://en.wikipedia.org/wiki/Operator_(computer_programming...

So the "%" symbol is an operator in C and Python, and it is standard usage to describe the operator that performs a mathematical function "X" as the "X operator".

As sibling comments have mentioned, "operator" here is being used in the programming language sense of the word, not the mathematical sense. Using the name "mod" [as in mod(a,n)] or "modulo" for the mapping that takes integers to a particular representative of its equivalence class in Z/nZ ("integers modulo n") doesn't seem like an unreasonable choice to me.
(mod N) is an operator which indicates that the arithmetic in the preceding formula or equation is understood as taking place in a modulo N congruence.

A triple equal sign is usually used for modular equations. That triple equal sign, together with the (mod N) out to the right, constitute an operator.

-1 ≡ 9 (mod 10)

Since it is written as part of the syntax of a formula or equation, and modifies its semantics, it is an operator. Just like d/dx, sigma notation and whatever not.

That's an operator. A map to the quotient.
One thing that is still "implementation-defined" in C and C++ is the result of a right shift of a negative integer. On pretty much all platforms, the >> operator shifts in the sign bit and does not round the result — which makes it equivalent to flooring division by a power of two. It is consistent with the division operator only when the left-hand value is positive.
However, the oposite shift is undefined if a 1 goes into the sign bit.

More precisely, regarding E1 << E2, it is written (in the April 2023 ISO C draft):

"If E1 has a signed type and nonnegative value, and E1 × 2ᴱ² is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined."

Thus if E1 is negative, or if the result overflows, UB.

Didn't they specify two's complement signed integers somewhat recently? What's the rationale for leaving these behaviours undefined?
On some CPUs architectures, the operand size for some instructions could be larger than an `int`, in which case the upper part of a CPU register would become invalid on overflow instead of containing an extension of the sign bit.

There are also CPUs that do "saturating arithmetic" where overflow results in INT_MAX instead of wrapping.

Because the shift operations are arithmetically defined, and the situations that are undefined correspond to overflow. v << 1 means the same thing as v * 2 and is undefined if v * 2 is undefined.
mentioned in the comments on the post, which merit reading in this case

i haven't read the rationale, but presumably the committee did this because it's what virtually all cpus do (because it's what fortran does), so it's the only thing that can be implemented efficiently on virtually any cpu with a division instruction, and so virtually all c implementations did it, and standardizing the behavior of virtually all c implementations is better than leaving it implementation-defined or standardizing a behavior that conflicts with virtually all existing implementations and can't be implemented efficiently on most high-end hardware

Who would have ever thought that division could be so ... divisive?
as sophie w. arm said, 'i have a multiplier, not a divider'
At the RISC of sounding short on instructions, do you mean Sophie Wilson's ARM? ;)

https://en.wikipedia.org/wiki/Sophie_Wilson