Hacker News new | ask | show | jobs
by nish__ 20 days ago
Doesn't every language support multidimensional arrays? It's just an array of arrays, no? What am I missing?
1 comments

An array of arrays is an extremely inefficient and error-prone way to represent multidimensional arrays.

If I want a 1000x1000 array, representing it physically as a single 1000000-element array requires one allocation, and processing it element-by-element (assuming it's stored in the same order we're iterating over it) is sequential in memory and therefore very efficient.

Representing it as 1000 separate 1000-element arrays requires 1000 allocations, and pointer-chasing every time we move from one row to the next.

Isn't an array of arrays by definition the sequential implementation?

Otherwise you would have an array of pointers to arrays. The usage (syntax) for them would be the same but the performance would not be.

They also have different uses. You would expect an array of arrays to be an array of arrays which share the same length. For an array of pointers to an array you would expect dynamic length arrays contained within the original array.

Even in c++ could you not just define some int [1000][1000]foo? I've never really used C++ but my C knowledge assumption is that is 1000000 continuous elements.

The C++ way to do it currently would be:

    std::array<std::array<T, N>, M> data;
Which is contiguous

    int data[M][N]; 
also works fine and is contiguous in C++

Edit:

For the stack at least. On the heap, you'd need to use a single std::vector<int> and do the indices manually, or use mdspan

I does not work fine in C++ when N and M are not compile-time constants, which is basically always the case in any interesting numerical algorithm. Also not in Rust.

It works fine in C though, or FORTRAN, or Ada, or ALGOL 60, ...

Which is why std::mdspan exists, and std::linalg.

NVidia has pivoted to design CUDA hardware with focus on C++ back in , and seems to be doing quite well for them.

CppCon 2017: "Designing (New) C++ Hardware”

https://www.youtube.com/watch?v=86seb-iZCnI

They were also the ones sponsoring the ISO work on mdspan, while HPC research labs are pushing for linalg on top.

I would rather be using Ada today, but that isn't how the world moves.

I see that they spend time making their hardware run general software, but I can't see anything specific in GPU hardware to std::mdspan.

I respect Ada but I would not want to use it. But I have a choice between C++ with hmdspan and C99's arrays, I choose C99 any time.

> Even in c++ could you not just define some int [1000][1000]foo?

If it fits on the stack, yes.

Typical code using MD-arrays is scientific code, and the data they manipulate generally do not fit there.

Would the compiler not allocate the memory contiguously on the heap in that case then? Seems like a reasonable thing to do.
Nope. The C++ memory models is designed around no hidden/non-deterministic memory allocation.

If you try to allocate 10MB on the stack, that's the dev problem if the program fails, it's not the compiler job to guesstimate whether something will fit there or not (and it's impossible anyway, the compiler can't know all the stack sizes a program will ever run on).

I see. That makes sense.