Hacker News new | ask | show | jobs
by thwarted 3953 days ago
The visuals are definitely valuable in explaining this.

It used to be popular, and still is in some circles, to debate whether programming languages ought start array indexing at 0 or 1.

When talking about this with other programmers, I've discovered that a lot of the issues/confusion could be avoided by consistent use of terminology: Offsets/offsetting always being zero-based and indexes/indexing always being one-based.

Using rulers and birthdays also helps to explain differences. You're in the first year before your first birthday, being zero (whole) years old.

To make matters potentially more confusing, culturally, I remember something about the ground floor in the UK buildings not being "Floor 1" like it is in the United States.

http://www.hacker-dictionary.com/terms/fencepost-error

6 comments

I would argue an index, in the sense that it might be an integer, is indistinguishable from an offset. I would refer to one-based indexing as positional—1st, 2nd, 3rd, ..., ith.

Of course, it's the offset from the first element, so it's kind of a circular definition.

Generally, it's useful to distinguish offset from name sometimes. For instance, gravitational potential energy has numbers associated with different amounts, but once you choose an origin you can compute energy differences.

Another example is screen space versus screen displacement. This is the difference between affine space and a vector space. Whether the upper corner is (0,0) or (222,22) shouldn't matter as long as you are doing everything relative to some point.

In C, I argue we always use offsets. Each element of an array A is at a particular memory location, the name of the array being the first memory location, and then A[i] means take the thing at A+i. Notice that the difference p-q between two memory locations p and q is exactly the offset you put into an index expression: q[p-q] == *p.

That said, it is convenient to confuse the offset with the memory location since 1. the memory location is likely not known when writing the program 2. if it were known, it would be almost impossible to use.

Now, an anecdote: I was helping implement a QR factoring algorithm from a textbook which uses 1-based indexing in a language which uses 0-based offsets. We tried changing the bounds of the nested loops to account for the difference, but it was basically impossible to avoid off-by-one errors. So, we left the loop bounds as the textbook had them and instead indexed like A[i-1], since this i-1 is the offset from A[0], the array element labeled 1.

I remember something about the ground floor in the UK buildings not being "Floor 1" like it is in the United States.

Actually, that's perfectly explained with your offset vs. index terminology. In some countries, the floor number is an index within the array of floors. In others, it's an offset from the ground.

Except in the UK, where there often is a Mezzanine floor somewhere above the ground floor (usually, but not always, between the ground floor and the first floor).

Is there an Esolang that numbers its arrays with 0,M,1,2,3...?

A mezzanine is, by definition, a floor offset a non-integral number of storeys from the floors around it; a floor existing at a fractional floor number, in other words. A mezzanine "between the first and second floor" (in american parlance) would have a floor offset of 0.5 (or possibly ranging from 0.3 to 0.7, since mezzanines usually involve complex arrangements of stairs and landings.)
Oh, I know it's perfectly explained by that, but using an example of building floors has cultural idioms. Ruler measurements and birthdays seem to be more universal.
Ah, ok.

In that case, wouldn't the floors actually be a good real-world example to students? It's a case where the index vs. offset convention seems to be split roughly 50%/50% around the world.

If people can't agree about which way is better for numbering floors, it's no surprise that number-crazy programmers can't agree about numbering a whole lot of other things :)

In that case, wouldn't the floors actually be a good real-world example to students? It's a case where the index vs. offset convention seems to be split roughly 50%/50% around the world.

I suppose it would be a good real world example of the contrast, but I was saying I don't think it's a good, universal example to explain indexing or offsetting specifically. "You start your fourth year alive on your third birthday" has nearly universal understanding, "You exit the building on the floor numbered 1" is highly idiomatic.

The birthday analogy is a good one, I'll have to keep that in mind. Your actual "birthday" being the points between year elements, your "age" as the offset from birth.

As to the point about floor numbering, the situation can be a little confusing in Canada. Some buildings label in the US style, labeling stories [1st, 2nd, 3rd] while others label in the UK style as [Ground, 1st, 2nd]. We also often mix them and you'll see [Ground, 2nd, 3rd], with ground sometimes replaced by main, or lobby.

[Lobby/Ground, 2nd, 3rd] is common in the US too
I prefer that model for floor numbering, personally. It means if you're on floor N that you're N stories above the ground.
Not that it matters, but I feel the opposite. Barring the presence of subterranean levels, if you're on the ground floor, you are standing on the first literal "floor" of the structure.
Well, depends which country you live in.

In British English, the first floor is the first above the ground.

In American English, the first floor is the ground.

> You're in the first year before your first birthday, being zero (whole) years old.

I would argue that your "first birthday" is, in fact, the day you are born—your birth day. The thing that happens for the first time a year later, is the first anniversary of your birth day.

And I'd say that the word "birthday" is defined to mean "anniversary of the day you were born"
_It used to be popular, and still is in some circles, to debate whether programming languages ought start array indexing at 0 or 1_

this is an exemplary case of citation needed if I ever saw one. maybe it's a valid debate for programming languages that doesn't allow people to do pointer arithmetic, which already restrict the field a lot, but even then that's sound as part of the 4GL bullshit that never really took off, and for good reasons

I don't know what kind of citation would satisfy you. These debates still come up in e.g. the Lua (1-based) mailing list, and used to be everywhere.

Visual Basic had the "OPTION BASE" statement to select.[0] (Many other versions of basic did too)

APL also has the ⎕IO Index Origin setting [1]

If you want to see a lively debate, there's c2[2], and there's also Dijkstra[3]

[0] https://msdn.microsoft.com/en-us/library/aa266179%28v=vs.60%...

[1] https://en.wikipedia.org/wiki/APL_syntax_and_symbols

[2] http://c2.com/cgi/wiki?ZeroAndOneBasedIndexes

[3] http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

I like Mike Hoye's historical treatise of 0 vs 1 based indexing in [0.5]. A very interesting read in several ways. It subsumes:

[...] before pointers, structs, C and Unix existed, at a time when other languages with a lot of resources and (by the standard of the day) user populations behind them were one- or arbitrarily-indexed, somebody decided that the right thing was for arrays to start at zero.

[...] the technical reason we started counting arrays at zero is that in the mid-1960’s, you could shave a few cycles off of a program’s compilation time on an IBM 7094. The social reason is that we had to save every cycle we could, because if the job didn’t finish fast it might not finish at all and you never know when you’re getting bumped off the hardware because the President of IBM just called and fuck your thesis, it’s yacht-racing time.

[0.5] http://exple.tive.org/blarg/2013/10/22/citation-needed/

> sound as part of the 4GL bullshit

Actually it's primarily early languages plus Lua.

[0] https://en.m.wikipedia.org/wiki/Comparison_of_programming_la...

The Math-DSL's like Matlab, Julia, and Mathematica would like to chime in and say 1-based indexing translates better with math research/lit.
Sometimes; other times math desperately wants indexes to start at 0 for the same "offset" reasons, to avoid having to add one. If you're using indexes as subscripts, some formulas start subscripting at 0. If you're building a series, many series start indexing at 0, not least of which because you often want the first term to involve a 0 in the exponent to make a constant term. Polynomial powers start at 0. 0 is a more common bound for integrals than 1. Angles start at 0. Physics values start at 0.

If anything, mathematics provides as much of a reason as pointer arithmetic to start indexing at 0. Indexing from 1 occurs more if you're creating a one-to-one correspondence with some real-world object, and you want to number those objects starting from 1, perhaps because that's a convenient user-visible numbering.