| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by twanvl 1392 days ago

Other advantages of zero based indexing, beyond being 'closer to the machine':

It works better with the modulo operator: `array[i%length]` vs `array[(i+length-1)%length+1]`. Or you would have to define a modulo-like operator that maps ℕ to [1..n].

It works better if you have a multi-dimensional index, for example the pixels in an image. With 0 based indexing, pixel `(x,y)` is at `array[x+widthy]`. With 1 based indexing it is at `array[x+width(y-1)]`. You might argue that programming languages should support multi-dimensional arrays, but you still need operations like resizing, views, etc.

6 comments

armchairhacker 1392 days ago

Another advantage is with ranges: 0-based indexing and exclusive ranges work well. This is apparent with cursor position in text selection

Consider:

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       0 1 2 3 4
    Range [0,3)     [0,1,2]
    Range [2,5)         [2,3,4]
    Range [1,1)       []

If we used 1-based indexing and exclusive ranges, it leads to ranges where the end index is greater than the string's length...

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       1 2 3 4 5
    Range [1,4)     [1,2,3]
    Range [3,6) (!)     [3,4,5]
    Range [2,2)       []

but if we use inclusive ranges, it leads to ranges where the end index is less than the start index...

    Characters       h e l l o
    Cursor index    0 1 2 3 4 5
    Char index       1 2 3 4 5
    Range [1,3]     [1,2,3]
    Range [3,5]         [3,4,5]
    Range [2,1) (!)   []

Also:

    Characters            h e l l o
    Cursor index         0 1 2 3 4 5
    0-based range [0,3)  [0,1,2]
    1-based range [1,4)  [1,2,3]

for the 0-based range [0, 3), the left array bracket is at cursor index 0, and the right bracket is at index 3. With 1-based indexing it doesn't work like that because the range is [1, 4)

samatman 1392 days ago

This is the bane of my existence working with Lua.

Iterating an array or adding to the end are fine, we have ipairs and insert for that, but ranges on strings I'm constantly having to think harder and write more code than necessary.

I love the language, wouldn't trade it for another, but the 1-based indexing on strings, which represents an empty string at position 3 as (3,2), it's egregious.

Not as egregious as a dynamic language where 0 is false though.

chii 1392 days ago

> a dynamic language where 0 is false though.

well, C also considers 0 being false (and you can argue that C is "dynamic"!).

yccs27 1392 days ago

Yeah, offsets are just easier to mathematically manipulate than ordinals. It's not just pointer arithmetic where it matters that item i corresponds to start+i*step. Any time you want to convert between integer indices and general linearly-spaced values, 0-based indexing is more convenient.

tabtab 1392 days ago

In many domains code maintenance is more important than hardware costs. In many domains 1-based indexing is a better fit, meaning less conversion code, meaning simpler code. Thus, the best indexing choice depends on the domain and circumstances, as do many controversial questions. Most tend to specialize in specific kinds of domains and over-extrapolate their experience into other domains.

wizofaus 1392 days ago

I'd agree so why did languages that allow specifying the base die (other than maybe VBA).

btilly 1392 days ago

Because changing the base is a great source of bugs.

That said, not all languages have given up on this. For example Julia allows it. https://docs.julialang.org/en/v1/devdocs/offset-arrays/

Ironically I learned this from a discussion of Julia bugs. Apparently changing offsetting of arrays has proven to be a source of bugs in Julia. So maybe someday they will come to the same conclusion as languages like Perl and stop allowing it.

wizofaus 1392 days ago

I'd argue 0-base is a source of bugs too! Ideally we'd be able to catch more array indexing bugs at compile time - there are definitely cases where it should be possible to determine that arrays are being incorrectly indexed via static analysis.

btilly 1392 days ago

The problem is that libraries which assume 0-base break when you have a 1-based array. And vice versa. Trying to combine libraries with different conventions becomes impossible.

Therefore changing the base leads to more bugs than either base alone.

That said, the more you can just use a foreach to not worry about the index at all, the better.

Of 0-based and 1-based, the only data point I have is a side comment of Dijkstra's that the language Mesa allowed both, and found that 0-based arrays lead to the fewest bugs in practice. I'd love better data on that, but this is a good reason to prefer 0-based.

That said, I can work with either. But Python uses 0-based and plpgsql uses 1-based. Switching back and forth gets..annoying.

wizofaus 1392 days ago

I'd expect the compiler not to let you to pass a 0-based array to a library function expecting a 1-based array. I'm pretty sure that's how it worked with Visual Basic, which was the only language I ever used such a feature in.

bluetomcat 1392 days ago

A disadvantage comes to mind. While the following loop works as expected:

    for (size_t i = 0; i < length; i++) ...

The following causes an unsigned integer underflow and is an infinite loop:

    for (size_t i = length - 1; i >= 0; i--) ...

mananaysiempre 1392 days ago

As Jens Gustedt points out[1], the following intentional unsigned overflow works perfectly for downwards iteration (even when length is 0 or SIZE_MAX), though it looks a bit confusing at first:

  for (size_t i = length - 1; i < length; i--) ...

You are also free to start at any other (not necessarily in-bounds) index, just like with ascending iteration.

[1] https://gustedt.wordpress.com/2013/07/15/a-praise-of-size_t-...

salawat 1392 days ago

Principle of least surprise violated.

Also, that behavior is not guaranteed. The programmer would need to be aware of how the particular machine in question actually handles that.

Then again, that's C.

sersnth 1392 days ago

Unsigned integer underflow and overflow are both guaranteed to wrap by the C standard.

411111111111111 1392 days ago

Uh I'm confused but don't know c++.

why doesn't that loop end instantly?

I mean length - 1 < length should always be true, right?

Or does it only terminate when the number underflows? Terribly confused here

Jtsummers 1392 days ago

For loops are translatable from:

  for(initialize; condition; increment) { ... }

to:

  initialize;
  while(condition) {
    ...
    increment
  }

(more or less, some scoping things not encompassed by the above; this is also how pretty much every for loop in a C-syntax language works) The condition of a for loop is equivalent to a while loop's condition. So yes, length - 1 < length will be true on the first iteration, which is fine because the loop continues as long as that condition is true.

What the above approach takes advantage of is that when underflow eventually happens you'll have this condition:

  MAXINT < length

Which will terminate it for all possible values of length.

bobthepanda 1392 days ago

It‘s an unsigned int, so past 0 it overflows back to the maximum

411111111111111 1392 days ago

Ooh, i see. I wasn't aware that they under/overflow at 0 when they're unsigned. Thanks for broadening my horizon!

naavis 1392 days ago

It loops while the condition is true. When an underflow happens, it stops being true.

msk-lywenn 1392 days ago

It’s a condition to run, not a condition to stop

wizofaus 1392 days ago

That was my reaction - why should anyone think it might be the latter? Are there languages that do have such a syntax without explicit keywords ("do...until")?

msk-lywenn 1382 days ago

I think lisp or scheme does. I was often confused by that when I was playing with it

billforsternz 1392 days ago

The loop continues until i transitions from 0 to 0 minus 1. 0-1 in this case actually doesn't equal -1 since size_t is an unsigned type, instead it wraps around to be the largest possible positive integer instead. TLDR; yes as you speculate it terminates when the number underflows.

xxs 1392 days ago

the footnote [1] should be [0], just for the sake of this very topic.

Seriously though, while the idiom does work for unsigned integers, it's a bad idiom to learn [makes code reviews harder]. The post-decrement one in the loop body works with everything (signed/unsigned), and it's well known.

operator-name 1392 days ago

In that specific case I'd do the following:

    for (size_t i = n; i-- > 0 ;) ...

Or count from `length` to 1, but subtract 1 in the loop body, or count up and subtract the length in the loop body. Any modern compiler should be able to optimise these to be equivalent.

In the majority of cases, counting down is not necessarily. Nor is ordered iteration. Most languages have a `for each` style syntax that's preferable anyway.

cylon13 1392 days ago

Ah yes, the goes-to operator -->

tomjakubowski 1392 days ago

And the wink operator, so the compiler knows you know the deal

bmacho 1392 days ago

I like how GP is called "operator-name" and instead of doing himself, he makes others joking with operator names. Although I'm not sure if it's altruism or highly manipulative behaviour.

wizofaus 1392 days ago

Gold - have an upvote!

divbzero 1392 days ago

Or alternatively:

  size_t i = length;
  while (i--) ...

jecel 1391 days ago

Unless I am remembering C wrong, this would work but the

   for (size_t i = length; i-- > 0 ; ) ...

that several other people posted would not execute for index 0. Shouldn't it be this instead?

   for (size_t i = length; --i > 0 ; ) ...

xxs 1392 days ago

the correct way/idiom to reverse iterate an array is

  for (size_t i = length; i-- > 0; )...

It's surprising how often the issue pops, it works well with both signed and unsigned integers.

(edit) I've started with one based indexing (basic)... mixed with 0 based (assembly), more 1 based (pascal), then more stuff (all zero based). I am, yet, to see a real advantage of a one based indexing... after the initial process.

empiricus 1392 days ago

Is this cache friendly?

sgtnoodle 1392 days ago

Sure, why wouldn't it be? As far as a cache is concerned, I don't think reverse sequential iteration would be any different than forward sequential. The actual RAM accesses may be less optimal if there's some speculative pre-fetching with assumed forward sequential access, but that's conjecture.

ehvatum 1392 days ago

With some exceptions, hardware prefetch works in terms of ascending accesses. To learn if a particular CPU will prefetch for descending access, benchmarking is essential. Best to use soft prefetch calls if performance is critical.

chii 1392 days ago

i would suspect that the cache prefetch/prediction could use the "velocity" of the memory access to predict the next access; so if the access pattern was going backwards, the "velocity" would be negative, but prefetching would still work if they just followed the predicted pattern.

ehvatum 1392 days ago

It is not, unless your compiler is smart enough to recognize reverse iteration and prefetch appropriately.

If performance matters, you should experiment with __builtin_prefetch, which is available in clang and GCC.

msk-lywenn 1392 days ago

It’s not. It was nice on architectures were cache didn’t matter much and were subtracting and comparing to zero was just one instruction (looking at you old core ARM)

bregma 1392 days ago

In the C programming language unsigned integers do not overflow. They wrap. This is well-defined behaviour and the example code is simply incorrect. Most modern compilers will give you a diagnostic for this.

ncmncm 1392 days ago

If you have foolishly turned off -W in your build system, that could happen. Otherwise, you get a nice warning pointing out your folly.

nemetroid 1392 days ago

Unless wrapping underflow is sensible for the domain (which it isn’t when representing the size of something), unsigned integers are usually a bad idea.

pwdisswordfish9 1392 days ago

You can always rip a page out of C++’s playbook:

    for (size_t i = length; i > 0; i--) {
        // ...
        item = array[i - 1];

(This is how reverse iterators work in C++.)

10000truths 1392 days ago

  for (size_t i = 0; i < length; i++) {
      size_t j = (length - 1) - i;
      ...
  }

EDIT: change i to j

yuliyp 1392 days ago

That's quite broken (you'd want a different variable inside the body, vs clobbering the iteration counter, else this would process the last item in your list, then exit).

rabbidruster 1392 days ago

I hope you haven't done this anywhere.

Makes my brain hurt, but I think this will only run through the loop one time looking at the last element of the array.

unwind 1392 days ago

Uh no don't reassign the loop variable in the inner scope. Use:

    const j = (length - 1) - i;

in that case. Much safer.

Banana699 1392 days ago

You should save the old value of i somewhere and restore it back at the very end of the loop. Or simply define a new j like another comment says.

mananaysiempre 1392 days ago

A rare case where 1-based indexing is more convenient is complete binary trees laid out breadth-first (as in a standard binary heap): parent is i div 2 and children are 2i and 2i+1 when starting at one and who knows what when starting at zero. But that’s the only one I know.

wizofaus 1392 days ago

Except 1-based indexing is what we use in normal language. We don't use "zeroeth" or "player (number) zero" etc. And the word "first" is shortened to 1st etc. Personally I think we'd be better off if programming languages stuck to the same convention - off-by-1 errors aren't the hardest problems to deal with but they're still annoying.

ChickeNES 1392 days ago

Well: https://www.merriam-webster.com/dictionary/zeroth

wizofaus 1392 days ago

Sure, it's technically a word, but hardly one you'd casually drop into your conversations (with non-programmers)

Quekid5 1392 days ago

> "player (number) zero"

That would be because any game worth playing has at least one player... and so it's natural to continue from there. (In terms of language.)

OscarCunningham 1392 days ago

You might have heard of Conway's Game of Life.

Quekid5 1390 days ago

Fair :)

lokedhs 1392 days ago

That's confusing ordinal and cardinal numbers. The element with index 0 is the first number. The element with index 1 is the second number, and so on.

Using the term "zeroth" is basically some form of showing off (even though it's kinda fun), but will be utterly confusing when you get to the fifty-second element which is the last in a group of 53 elements.

wizofaus 1392 days ago

I'm not confusing them, my point about abbreviating "first" as 1st was that in typical speech we start counting at 1. Nobody says "let's start with item zero on the list". But programmers are stuck with having to say/think "item 0 in the array".

lokedhs 1392 days ago

I don't disagree with you. I just don't think a programmer should be confused about the statement "item 0 is the 1'st item".

misttar 1392 days ago

The 2 major blunders in programming: 1) Off by one errors.

btilly 1392 days ago

With 0-based indexing the children are at 2i+1 and 2i+2. The parent is at (i-1) div 2.

Not hard to figure out.

JadeNB 1392 days ago

> With 0-based indexing the children are at 2i+1 and 2i+2. The parent is at (i-1) div 2.

> Not hard to figure out.

While that's true, "you just shift by 1" is equally good at all arguments for or against 0-based indexing, so deploying it here probably won't convince.

btilly 1392 days ago

I was not trying to convince.

That said, the effort of one versus the other is so trivial that there is no point in ever using effort as an argument either way. Doubly so because what seems like effort to us is simple unfamiliarity.

What is important is which one leads to more careless errors in practice. As a trivial example, consistent indentation takes effort, but failing to do it leads to more careless errors. Therefore everyone indents code.

The only data point I've seen on that is the side remark about Mesa in https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E.... That remark, therefore, is the only argument that I care about.

naet 1392 days ago

An incredibly unrare case happens all the time in my work: array[length - 1], or variations of this. Anything involving the last element of the array, and often iterating through the whole array will use something similar at some point.

msdrigg 1392 days ago

This is the reason I always think of when I hear the question

OscarCunningham 1392 days ago

I think the 1-indexing folks would have to argue that a%b should return a value from 1 to b inclusive. This does make the same sort of intuitive sense as 1-indexing. For example we number clocks from 1 to 12.

vore 1392 days ago

% is defined as (mostly) a remainder operator. I don't think you want to change the semantics of division itself.

OscarCunningham 1392 days ago

Right. Ultimately the 1-indexers are wrong.

black_knight 1392 days ago

But interestingly, the number at the top is 12, not 1.

pvinis 1392 days ago

No we don't. Clocks start at 0. what time is it when it's half an hour after midnight? 00:30.

chii 1392 days ago

but nobody "says" zero o'clock - it's always twelve o'clock!

and the example is in 24hr format - which needs to have the 00 to differentiate it from being 12:30. But if you write in 12hr format, you don't ever use 00 - it's always 12:30am or 12:30pm

pvinis 1392 days ago

First of all, you don't. Or I guess most Americans don't. I do, or most Europeans do. 12.30am and 12.30pm feels just very wrong in Europe.

But yea, I do agree with you. we don't say 0, we say 12, no matter if it's noon or midnight. That's because we humans avoid saying zero when we mean zero.

It's the same with other comments here, talking about counding seconds, we say "ok go, one, two,.. and not "zero, one, two,..". We say 3 months old baby, and not 0 years old baby.

We use 0-index in so many things, we just avoid saying the word "zero", and we use other names or other units to avoid that word.

lokedhs 1392 days ago

I'm from Sweden, and I certainly would say 00:15 (as zero fifteen). Although the time at exactly 00:00 would be called midnight.

cryptonector 1392 days ago

12 is zero, in 12-hour clocks.