Hacker News new | ask | show | jobs
by kbd 2533 days ago
> The classic example is getting the last element of an array.

Good point, in Python I don't notice that the last element is arr[len(arr)-1] because Python provides arr[-1]. I think in general your point is that it's natural for the nth element to be arr[n].

> The reason for zero indexing is historical, related to pointer offsets.

There is that, but Dijkstra's paper makes the case from first-principles that the closed, open interval of [0,n) for sequences is the most appropriate.

> with 1-indexing I can multiply numbers by arrays... 3 x 1 is three...

Sorry, I don't understand this. It makes sense that the point I don't understand is probably most-related to why Julia chose its indexing scheme and why Matlab et al. do the same.

> One nice this about 0-indexing is that I can slice a list in half with the same midpoint.

Yeah, arr[0:index] + arr[index:len(arr)] is the full list. And to your point earlier ("if I take the first nth elements of a list"), len(arr[:n]) == n seems natural.

Edit: I've been trying to formalize why Python's indexing scheme, along with its negative-indexing, is optimal (slight pseudocode):

    l = ['a','b','c']
    n = len(l)
    i = -n
    while i < n:
        print(l[i++])
prints "a b c a b c". That code makes no reference to any bound but 'n', nor any constants (1,0) or offsets, yet it iterates over the list twice through its range (first negative indices then positive).
2 comments

Dijkstra's write-up is full of subjective aesthetic judgements that certain things are ugly. I personally don't find `1:0` for an empty sequence to be ugly, and I do find using `1:1` to refer to an empty sequence and `1:2` to refer to the sequence `{1}` to be ugly. I would encourage everyone to read over his reasoning and see if you agree with his aesthetic judgements.
I’m constantly baffled by the way people hold that paper up as some sort of objective proof that 0 based indexing is superior.
Zero based indexing objectively has various convenient properties which one-based indexing doesn't.

The value of convenience over inconvenience isn't objectively better, that's all.

Objectively speaking, if I find the least positive residue of some integer modulo M, I get a value from 0 to M-1. If my M-sized array is from 0 to M-1, that is objectively convenient:

  hash_table[hash(string) % table_size]
Objectively speaking, if I have some files in a directory and I give them zero based names like file000, file001, ... then I can objectively refer to the first volume of ten using the single pattern file00?, and the next volume as file01?. If they are numbered from 001, I need file00? file010 to match the first ten. For the next ten, the file01? pattern unfortunately matches file010 so I objectively need some way to exclude it.

Objectively speaking, if I have zero based indices to a 3D array, I can find the address of an element <i, j, k> using the homogeneous Ai + Bj + Ck rather than Ai + Bj + Ck + D, which objectively adds an extra term.

Objectively speaking, the zero-based byte index B can be converted to a four byte word index using B / 4 (truncating division), and within that word, the zero-based byte local offset is B % 4. Objectively speaking, the same conversion from 1-based bytes to 1-based words requires (B-1)/4+1 and (B-1)%4+1, which is objectively more syntax and more nodes in the abstract syntax tree.

There is no reason I should like shorter, simpler, faster; that's a purely subjective aesthetic. After all, a short poem isn't better than a long one; a bacterium isn't better than a rhinoceros; and so on.

Hey, how about those one-based music intervals? C to E is a major third and all that? We have a diatonic scale with seven notes, right? As we ascend, whenever we cycle through seven notes, we have passed ... one more octave. And to invert an interval, we subtract from ... why, nine of course! And a fifth stacked on top of a fifth is a major ninth. There is objectively more cruft in 1 based music theory than 0 based. But there is no accounting for people liking it that way, right?

You do realize that there are approximately just as many use cases where 1 based indexing is more natural and involves less operations, right? It’s highly problem and context dependant.

But I’m not trying it make the point that 0 based or 1 based is better or worse than the other. I’m just saying that it’s a borderline immaterial difference for most use-cases and Julia gives many many tools for getting around any problems that may arise when a certain indexing scheme is awkward.

The 0 or 1 based debate is one of the most boring and pedantic arguments one can have and I do my best to ridicule people when they try to start it.

"Objectively speaking" there are pros and cons to each system. The largest pro of 0-based indexing is of course that it can correspond to a memory address plus an offset, which is the reason C (and derived languages) use 0-based.

But it is also an objective fact that using 1-based indexing means that the index corresponds to the ordinal numbers, e.g. index 1 is the first element, index 2 is the second element and so on. This also have a number of convenient properties.

For example February is the 2. month, so if you have a list of the names of the months, you would expect month_names[2] to be February. With zero-based you would have to do month_names[month_number - 1]. And if you want to get the month number from the name, you would have to do month_names.index_of(month_name) + 1. Be careful not to switch up the +1 and -1!

As for music theory, a third is called so because it spans three half-notes. It describe the size of a range which is independent of the offset of the indices. By the same token decades are 10 years (not 9) and centuries are 100 years (not 99).

Some machine-level implementation convenience is the smallest advantage. Zero based would be better even if it cost more at the machine level. Of course it doesn't cost more because the advantages are relevant at the implementation level also.

> For example February is the 2. month, so if you have a list of the names of the months, you would expect month_names[2] to be February.

That's not 1-based indexing being good; that's conforming to (or reflecting) an externally imposed 1-based system that is itself questionable.

Should the seconds of a minute go from 1 to 60 instead of 0 to 59? Dates and times are full of poorly chosen conventions, including ones that don't match people's intuitions. For instance, many people celebrated the new millennium in January 2000. People also want decades to go from 0 to 9; the "eighties" are years matching the pattern 198?, not 1981 to 1990. Yet the 20th century goes from 1901 to 2000.

In many situations when 1 based numbering is used, it's just symbols in a sequence. It could be replaced by Gray code, greek letters, or Japanese kana in i-ro-ha order.

When the arithmetic properties of the index matter to the point that it's involved in multiplication (not merely successor/predecessor relationship), it is advantageous to make the origin zero.

If month_name[1] must be "January", I'm okay with wasting month_name[0]; that's better than supporting a one-based array (let alone making that default).

> As for music theory, a third is called so because it spans three half-notes.

No it doesn't; in that very same music theory, a "major second" interval is also known as "one step" or a "whole step"! A third is "two steps"; that's what it spans. (I don't know what you mean by "half-notes"; I sense confusion.) This nonsense was devised centuries ago by innumerates, just like various aspects of the calendar.

But it is not some arbitrary historical accident that months are numbered from 1. It is the same reason the days of the month are numbered from 1. It is how ordinal numbers work!

Neil Armstrong was the 1st man on the moon - not the zero'th. Everywhere you have a sequence of discrete units, they are numbered starting from 1.

The thing with array indices in C is they are not ordinal numbers. They are offsets. Which means you can (at least in theory) do x[-1] to get the element before x. So a C array is not actually an array in the mathematical sense, it is just syntactic sugar for relative offsets in a larger array.

So what makes most sense? It really depends on what you want to achieve.

I think people get tricked by his way of building an argument. He structures it rhetorically almost like it is a mathematical proof, but if you read carefully the central argument is that it is "nicer"...in the particular example he picked. But if you don't pay attention it might feel like he "proves" that 0-based is universally better.

Then he goes on weird diatribes like the one about "corporate religion". I'm not sure, but I think he is trying to say that people who question his dubious logic are "sheeple".

I wasn't arguing so much that it was "objective proof", but that in contrast to "The reason for zero indexing is historical, related to pointer offsets. I don’t think anyone chose them to be easier for people.", I was pointing out that there are arguments for it as well. It's not just because "the address of an array is also the address of its first element in C".
Because of the author. If it had been someone else no one would use it as such.
> Dijkstra's paper makes the case from first-principles that the closed, open interval of [0,n) for sequences is the most appropriate

He argues that it is the most appropriate when indexing into an array of the natural numbers starting with 0. If the array in question started with 1, one-based indexing would be most appropriate following exactly the same logic!

I don't think that's a correct reading. You're talking about this section, correct?

> When dealing with a sequence of length N, the elements of which we wish to distinguish by subscript, the next vexing question is what subscript value to assign to its starting element. Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N.

He never specifies the contents of the array, he's talking about subscripts (i.e. indexes).

You may be right, but in that case, why is the second range "nicer"? AFACT it is because he explicitly adds 1 in the first example but implicitly subtract 1 in the second, which makes it looks cleaner. If you want the N'th element of an array, the range in the first example is N ≤ i < N+1 while in the second it is N-1 ≤ i < N. Written out like that, I don't see how the second is obviously nicer.
Sure. Something is generally considered more elegant ("nicer") when it has fewer terms. (I'm really happy with my own argument above why Python's indexing is "optimal".) But I'd put what I think you're saying this way: Dijkstra asserts that '1 ≤ i < N+1' is worse than '0 ≤ i < N', but he's choosing his conditions carefully. '1 ≤ i ≤ N' also has no extra terms. (Though in fairness to Dijkstra, this section was after he'd already argued why closed, open ranges are better.)

I will say this though: in all the noise in this sub-thread, I never got any example in answer to my original question asking for any algorithm or formula that works out better with 1-based indexing (my original response was to its parent's claim that 1-based indexing results in fewer ±1s in practice). Except for maybe the "stride" examples[1] for which I still don't understand why the starting index is important. I say this not in victory (ha ha! zero-based indexes are clearly superior!) but in disappointment because I was hoping to gain understanding of why Julia/Matlab and others (which are more geared towards math/stats which is outside of my experience) made their indexing choice. Particularly because it's against the norm, they must have good reasons.

[1] https://news.ycombinator.com/item?id=20425500

I'm actually arguing the opposite, that 0 ≤ i < N does have an extra term - it is just hidden! Where does the zero come from? The generalized form for picking the N'th number (with zero-based indexing) is N-1 ≤ i < N. So the 0 is N-1 which have been simplified because you know N is 1. But in that case couldn't you also just reduce the terms in the example with 1-based indexing similarly? Reducing the term in the one example to make it simpler is just cheating.

I'm not disputing the choice of closed-open rage, but I'm arguing the expressions are equally simple with either 1 and 0 based indexing.

As for an example where 1-based indexing is better, I had such an example in a another subthread: Translating between the numbers and names of months. Or indeed anywhere where you have a list of things and you want to pick them by ordinal numbers. E.g if you want the N'th president, it is simper to do presidents[N] than presidents[N-1].