Hacker News new | ask | show | jobs
by kazinator 2533 days ago
Zero based indexing objectively has various convenient properties which one-based indexing doesn't.

The value of convenience over inconvenience isn't objectively better, that's all.

Objectively speaking, if I find the least positive residue of some integer modulo M, I get a value from 0 to M-1. If my M-sized array is from 0 to M-1, that is objectively convenient:

  hash_table[hash(string) % table_size]
Objectively speaking, if I have some files in a directory and I give them zero based names like file000, file001, ... then I can objectively refer to the first volume of ten using the single pattern file00?, and the next volume as file01?. If they are numbered from 001, I need file00? file010 to match the first ten. For the next ten, the file01? pattern unfortunately matches file010 so I objectively need some way to exclude it.

Objectively speaking, if I have zero based indices to a 3D array, I can find the address of an element <i, j, k> using the homogeneous Ai + Bj + Ck rather than Ai + Bj + Ck + D, which objectively adds an extra term.

Objectively speaking, the zero-based byte index B can be converted to a four byte word index using B / 4 (truncating division), and within that word, the zero-based byte local offset is B % 4. Objectively speaking, the same conversion from 1-based bytes to 1-based words requires (B-1)/4+1 and (B-1)%4+1, which is objectively more syntax and more nodes in the abstract syntax tree.

There is no reason I should like shorter, simpler, faster; that's a purely subjective aesthetic. After all, a short poem isn't better than a long one; a bacterium isn't better than a rhinoceros; and so on.

Hey, how about those one-based music intervals? C to E is a major third and all that? We have a diatonic scale with seven notes, right? As we ascend, whenever we cycle through seven notes, we have passed ... one more octave. And to invert an interval, we subtract from ... why, nine of course! And a fifth stacked on top of a fifth is a major ninth. There is objectively more cruft in 1 based music theory than 0 based. But there is no accounting for people liking it that way, right?

2 comments

You do realize that there are approximately just as many use cases where 1 based indexing is more natural and involves less operations, right? It’s highly problem and context dependant.

But I’m not trying it make the point that 0 based or 1 based is better or worse than the other. I’m just saying that it’s a borderline immaterial difference for most use-cases and Julia gives many many tools for getting around any problems that may arise when a certain indexing scheme is awkward.

The 0 or 1 based debate is one of the most boring and pedantic arguments one can have and I do my best to ridicule people when they try to start it.

"Objectively speaking" there are pros and cons to each system. The largest pro of 0-based indexing is of course that it can correspond to a memory address plus an offset, which is the reason C (and derived languages) use 0-based.

But it is also an objective fact that using 1-based indexing means that the index corresponds to the ordinal numbers, e.g. index 1 is the first element, index 2 is the second element and so on. This also have a number of convenient properties.

For example February is the 2. month, so if you have a list of the names of the months, you would expect month_names[2] to be February. With zero-based you would have to do month_names[month_number - 1]. And if you want to get the month number from the name, you would have to do month_names.index_of(month_name) + 1. Be careful not to switch up the +1 and -1!

As for music theory, a third is called so because it spans three half-notes. It describe the size of a range which is independent of the offset of the indices. By the same token decades are 10 years (not 9) and centuries are 100 years (not 99).

Some machine-level implementation convenience is the smallest advantage. Zero based would be better even if it cost more at the machine level. Of course it doesn't cost more because the advantages are relevant at the implementation level also.

> For example February is the 2. month, so if you have a list of the names of the months, you would expect month_names[2] to be February.

That's not 1-based indexing being good; that's conforming to (or reflecting) an externally imposed 1-based system that is itself questionable.

Should the seconds of a minute go from 1 to 60 instead of 0 to 59? Dates and times are full of poorly chosen conventions, including ones that don't match people's intuitions. For instance, many people celebrated the new millennium in January 2000. People also want decades to go from 0 to 9; the "eighties" are years matching the pattern 198?, not 1981 to 1990. Yet the 20th century goes from 1901 to 2000.

In many situations when 1 based numbering is used, it's just symbols in a sequence. It could be replaced by Gray code, greek letters, or Japanese kana in i-ro-ha order.

When the arithmetic properties of the index matter to the point that it's involved in multiplication (not merely successor/predecessor relationship), it is advantageous to make the origin zero.

If month_name[1] must be "January", I'm okay with wasting month_name[0]; that's better than supporting a one-based array (let alone making that default).

> As for music theory, a third is called so because it spans three half-notes.

No it doesn't; in that very same music theory, a "major second" interval is also known as "one step" or a "whole step"! A third is "two steps"; that's what it spans. (I don't know what you mean by "half-notes"; I sense confusion.) This nonsense was devised centuries ago by innumerates, just like various aspects of the calendar.

But it is not some arbitrary historical accident that months are numbered from 1. It is the same reason the days of the month are numbered from 1. It is how ordinal numbers work!

Neil Armstrong was the 1st man on the moon - not the zero'th. Everywhere you have a sequence of discrete units, they are numbered starting from 1.

The thing with array indices in C is they are not ordinal numbers. They are offsets. Which means you can (at least in theory) do x[-1] to get the element before x. So a C array is not actually an array in the mathematical sense, it is just syntactic sugar for relative offsets in a larger array.

So what makes most sense? It really depends on what you want to achieve.