Hacker News new | ask | show | jobs
by a1369209993 2119 days ago
Consider the idiomatic way of interating backward through a array:

  for(i=n; i-- > 0 ;)
    { /* operate on a[i] */ }
converting i--; to a statement at the start of block makes it less clear that it's part of the iteration idiom rather than a ad hoc adjustment that's specific to this particular logic. There are other examples, but they're either more involved or statementification is less obviously wrong.
2 comments

Hmm, I think `for (i = n - 1; i >= 0; --i)` is way clearer and maybe more common?

edit: Ah unsigned underflow. :O

Yeah, so then you write

    for (size_t i = n-1; i < n; --i) { /* operate on a[i] */ }
It works fine (unsigned overflow is well defined) but it's even less clear.
It seems sensible to always just use signed values for indices. Indices are difference types, which should include negative values so that you can subtract two indices and get a sane delta. The range of signed values seems 'big enough.'
> Indices are difference types

Umm, no? Indices are ordinals[0], forming the canonical/nominal well-ordering of a collection such as a array.

> an ordinal number, or ordinal, is one generalization of the concept of a natural number that is used to describe a way to arrange a (possibly infinite) collection of objects in order, one after another. [...] Ordinal numbers are thus the "labels" needed to arrange collections of objects in order.

0: https://en.wikipedia.org/wiki/Ordinal_number

In C an index is a difference that you add to a pointer to get a pointer. `a[i]` is `*(a + i)`. Given two indices `i` and `j`, you want `i - j` to be such that `a[j + (i - j)]` is `a[i]`, and it then makes sense to me that `i - j` is signed. The expression works out whether they are signed or unsigned, but just in terms of their interpretation on the part of a user (eg. "oh this is 2 elements before bc. it says -2") or so that comparisons like `i < j` are equivalent to `i - j < 0` and so on. That's why it's always made sense to me to use `ptrdiff_t` (or just `int`) for an index, vs. using `size_t`.
ptrdiff_t exists for subtraction between pointers that produce negative values. But how many times have you ever needed to subtract p and q where p represents an array element at a higher index than q? For that matter, how many times have you ever needed to add a negative integer to a pointer?

In C an object can be larger than PTRDIFF_MAX, a real possibility in modern 32-bit environments. (Some libc's have been modified to fail malloc invocations that large, but mmap can suffice.) Because pointer subtraction is represented as ptrdiff_t, the expression &a[n] - a could produce undefined behavior where n is > PTRDIFF_MAX. But a + n is well defined behavior for all positive n (signed or unsigned) as long as the size of a is >= n.

There's an asymmetry between pointer-pointer arithmetic and pointer-integer arithmetic; they behave differently and have different semantics. Pointers are a powerful concept, but like most powerful concepts the abstraction can leak and produce aberrations. I realize opinions vary on whether to prefer signed vs unsigned indices and object sizes (IME, the camps tend to split into C vs C++ programers), but the choice shouldn't be predicated on the semantics of C pointers because those semantics alone don't favor one over the other.

The C language "de facto" uses size_t for indexing and ptrdiff_t for differences, or the rare case where you have a negative index.
size_t is unsigned? Since when?
It always has been. C89, 4.1.5[1]:

> The type are [...] size_t which is the unsigned integral type of the result of the sizeof operator

(Emphasis mine.)

1. https://port70.net/~nsz/c/c89/c89-draft.html#4.1.5

Couldn't find an online version of the C standard with links to parts of it, but here's one for C++: http://eel.is/c++draft/support.types#layout-3

> The type size_­t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object ([expr.sizeof]).

Yes. ssize_t is signed.
ssize_t should never be used.

It's not guaranteed to have a full negative range, only to be able to represent -1.

Use ptrdiff_t as a signed size type.

Out of curiosity, do you know of any implementations where ssize_t has that kind of range limitation?
That's the idiomatic way? Cool. The more straightforward-looking way,

    for(i = n-1; i >= 0; i--)
        { /* operate on a[i] */ }
breaks if i is unsigned, like a size_t.
Yep. That why it's a idiom, rather than a obvious-way-of-doing-it-that-anyone-competent-would-use.