| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yodelshady 1965 days ago
	Open question: why are multi-dimensional arrays, and matrices specifically, so neglected in almost every other language? They map well to practically freakin' everything, for what seems like.. not that much effort on the language design side, but an enormous amount of tedious, duplicated effort on the user side.

6 comments

cbkeller 1965 days ago

That is a great question, and while I don’t know for sure, I think a relevant anecdotal observation that stands out to me is that many languages which do have first-class multidimensional arrays also have one-based indexing. And I would speculate this in turn is because most people who wanted matrices badly enough to make them a language feature wanted to do linear algebra with them, where you generally also want one-based notation since that is how all the equations in textbooks and papers are written.

So a language with multidimensional arrays is in a lose-lose position of having to choose to either satisfy the linear algebraists at the cost of alienating general-purpose programmers who want to do pointer arithmetic, or else satisfy the latter while alienating the core demographic for multidimensional numeric arrays.

Personally, I’m fine with (or even slightly prefer) one-based for my own scientific computing, despite starting with C, since it really is more elegant for linear algebra, and I have never found myself needing or wanting to do pointer arithmetic in a language that does have good multidimensional arrays — but clearly it is still a major turn-off to many others.

lambda_obrien 1965 days ago

I don't get why a language can't have both 0 and 1 based indexing, just set it as a compiler flag or something. They are a homomorphism, it's not hard to add or subtract one.

mixedCase 1965 days ago

That sounds like it belongs in the top 10% of the worst (programming-related) things I could wish upon another programmer: Having to maintain projects where you have to read the build system's configuration and keep it in mental context just to figure out if an array access is mathematically correct or not.

I would imagine this support to be stomachable if it was active at all times but using a visually different syntax for 0-based and 1-based indexing, maybe using something like the guillemet for the latter (like this: «1») that is wild enough to draw the user's attention when reading the code, and not common/prominent enough across keyboard layouts to make most users think twice before using it when writing code unless it makes sense for the domain.

lambda_obrien 1965 days ago

It's hard enough to understand a codebase, but if a language was like Haskell for instance (you can enable certain extensions in Haskell with a flag at the top of a file), you could have a line at the top of a file defining that this file is 0 based, while maybe the default is 1 based for the language. It doesn't seem like that much more than you already have to consider in many languages.

mixedCase 1964 days ago

Funny you put it like that, because the fragmentation regarding language extensions is frequently discussed as one of the biggest issues affecting real world Haskell.

tome 1964 days ago

It's mostly talked about by people that don't use Haskell! In practice it's not really a problem.

cbkeller 1965 days ago

You can actually switch starting indices fairly easily in Fortran [1], or for that matter Julia, but for some reason the option doesn’t seem to be particularly widely used. People seem to give a lot of weight to the default, even when the effort to change is minimal. Added complexity/ambiguity?

[1] https://stackoverflow.com/questions/48562873/zero-indexed-ar...

benibela 1965 days ago

Why can't each array have its own indexing?

You could declare the array as array[0..100] of something resp. array[1..100] of something for 0/1 based, and just as well do array[2..100] of something, or array[-1..100] of something, for -1/2 based.

Or with letters array['a'..'z'] of something,

Or with enums. TEnum = (red, blue, green), and then an array array[red..green] of something

Actually, that is just Pascal

szc 1965 days ago

This would all be fine until you start passing and returning these arrays and indexes between independently implemented functions and modules. Then you'd have to start declaring an index type tied to a specific array or a way to look up the index base for a passed in array.

tharkun__ 1965 days ago

Which would be fine actually for the program itself, if this is all figured out by the compiler/at runtime, though obviously more inefficient "for no good reason".

It would wreak havoc on the programming side though. Have you ever worked on a code base with more than a few people? Have you ever had to deal with different preferences programmers have? Differences in opinion between smart, slightly (or not so slightly) non-social people?

Using 0 or 1 indexed arrays, the new space vs tabs war! Or the new 2 spaces vs. 4 spaces vs. 8 spaces vs. using tabs war!

kevin_thibedeau 1965 days ago

Ada lets you use any integer subtype as an array range. The first index can start anywhere you wish. It's string library defaults to 1-based but you can easily interoperate with strings using a different subtype. Code can be written that is base agnostic by using the attribute mechanism or by aliasing arrays to use a known range.

Gibbon1 1965 days ago

I remember some version of BASIC from the early 80's. You could create real multi-denominational arrays including sparse ones with any starting index(s) you wanted.

Matrix operations were first class.

hilbert42 1965 days ago

I curious, can you remember which ones?

Gibbon1 1964 days ago

I think it was a SDS Basic or a variant.

https://en.wikipedia.org/wiki/SDS_BASIC#MAT_commands

hilbert42 1963 days ago

OK, thanks. I've not used that. The problem I had was that I learned Fortran before coming across BASIC, so converting Fortran programs was often a problem when features where missing.

analog31 1965 days ago

Visual basic has that. Pascal lets you specify any index range.

znpy 1965 days ago

My dumb guess is that whereas Fortran was kinda designed with scientific computation as a first class use case and c/c++ let you do anything with memory (and make it easy to use inline assembly to access specific instructions), most other languages are designed as general purpose language without scientific (as in "high performance") computing as a main design objective.

And thus specialized data type, operators and syntax look like overhead. And thus language designers leave it out.

hyperrail 1965 days ago

One interesting wrinkle is that traditionally (dating back to FORTRAN IV at least), Fortran compilers store matrices in RAM address space using column-major order, where consecutive elements are in the same column, not the same row. [1]

Most languages that prioritize Fortran code interop also adopt column-major order, but most other languages that support multidimensional arrays do row-major order. I'm not sure why Fortran went column-major but because it did, a lot of libraries designed for Fortran callers (such as LAPACK and all BLAS implementations) need to be told that input arrays have been transposed when they come from languages like C++.

[1] https://en.wikipedia.org/wiki/Row-_and_column-major_order

leephillips 1965 days ago

That’s one reason why APL and its descendants are so powerful in the hands of people who have become fluent in viewing computation through the lens of arrays.

tryonenow 1965 days ago

Because the only place you really need them is in mathematical/scientific computing, and while many scientists tend to pick up a little programming for their research, the vast majority of computer scientists and programmers, in particular the ones with the interest and ability to work on languages, are not concerned with mathematical modeling/processing. It is a fairly rare interdisciplinary combination.

hctaw 1965 days ago

Plenty of DSA work has more optimal representation and evaluation as matrix arithmetic. Particularly graph traversals and transforms which are heavily used in optimizing compilers.

tharkun__ 1965 days ago

I think you just proved the parent's point: Most programmers working on 'business software' do not need this as a built-in feature of the language.

hntrader 1965 days ago

This is a great question and I hope you get an answer to this.

The number of woeful ad-hoc solutions I've seen to people handling matrix data in Java/C# for what should've been otherwise very basic analysis ...

Something with pandas-like capability in a lower level language would be amazing.

krona 1965 days ago

Something with pandas-like capability in a lower level language would be amazing.

More like numpy than pandas but I've used x-tensor (c++) with great success in several projects: https://xtensor.readthedocs.io/en/latest/index.html

nnm 1965 days ago

Every time I was forced to pandas I miss R data.frame and data.table.

The_rationalist 1965 days ago

Java has faster multi dimensional array than numpy, e.g nd4j can trivially leverage Intel MKL or even offload to CUDA.

modeless 1965 days ago

C# has multidimensional arrays. It's not Pandas or numpy, but it's something.