Hacker News new | ask | show | jobs
by matjet 1654 days ago
Multidimensional array construction is something I had looked forward to in Julia. I am not convinced that the approach taken in julia1.7 compares favorably with other language implementations (Ignoring R) In my view the numpy syntax has more clarity for this task. [[[1,2],[3,4]],[[5,6],[7,8]]] compared with [1 2;3 4;;;5 6;7 8]

It is not immediately clear why [1,2,3,4] is equivalent to [1;2;3;4], but [1,2,,3,4] (and [1,2;3,4] vs [1 2;3 4] ect) is not equivalent to [1;2;;3;4]. For creating a 3d slice, I expected that ";", ";;", ";;;" would each refer to incrementing a specific dimension. Eg, It seems intuitive that if you can create a 2d matrix with [1 2;3 4], then you should be able to make a 3d tensor with [1 2;3 4;;5 6;7 8]

4 comments

The semicolon use is actually completely consistent, ";", ";;", ";;;" etc. do indeed refer to incrementation of the corresponding dimension. Try [1;2;3], then [1;2;3 ;; 4;5;6] and then [1;2;3 ;; 4;5;6 ;;; 7;8;9;; 10;11;12]

The confusion arises because "," and whitespace also have overlapping meanings in array notation. "," is used for regular vector creation, and whitespace for concatenation in the second dimension. The coexistence of those two different notations is a bit uneasy, but I doubt that "," and whitespace will be deprecated, since they are so entrenched and familiar. And for 1D and 2D arrays (which probably make up >99% of all literal array use) it's also more elegant and clean.

Maybe this will help: "," separators only work for 1D arrays, you cannot use them while making 2D or higher arrays. Whitespace is used when you want to create the array writing the data down row-wise, so your innermost dimension in writing is actually the second dimension of the array. The semicolons are for completely consistently going from the first to the n'th dimension, with the corresponding number of ";" in each dimension.

I think it would be hard to come up with a nice way to express these in a unified notation.

The way numpy does this with lots of brackets isn't really very convenient when working in 2D, which is the more common case.

To address the specific examples: [1,2,3,4] is literal vector creation, and also "," is just the regular way you create a list of inputs to a function. [1;2;3;4] is concatenation along the first dimension, so it must be the same as [1,2,3,4].

[1,2,,3,4] has no meaning, because repeated "," hasn't been given any syntactical meaning. But maybe that would have been a good idea?

[1,2;3,4] mixes literal vector syntax and vertical concatenation. The only reasonable interpretation would be that it's the same as [1;2;3;4], so maybe it could have been allowed, but ";" is supposed to concatenate arrays, with a special case for scalars (0-dimensional arrays), it's not clear to me what would be concatenated in [1,2;3,4].

[1 2; 3 4] on the other hand, concatenates two row vectors vertically, so this has a clear meaning. It can't be equivalent to [1;2;;3;4], since that has 1 and 2 lying along a column not a row.

A 3D tensor can't be [1 2;3 4;;5 6;7 8], since it only has ";;" while concatenation along the 3rd dimension must be ";;;". The notation [1 2;3 4;;; 5 6;7 8] works for this, but mixing whitespace notation and ";" notation is confusing.

So, clearly this is all a bit complicated, but it is a solution to a somewhat complicated problem, where you both need to allow new, consistent, notation, while simultaneously keeping the historical notation, which is in fact better in the most common (lower-dimensional) cases.

Yeah, I looked up the manual and completely fail to understand how the syntax is supposed to be read and written.
I'd love to figure out where the disconnect is — and how we can make the manual more clear. It's a pretty simple rule: the number of semicolons specifies the dimension in which you "move". I've seen two disconnects, but yours might be different

* Julia's arrays are column major but when you use spaces to write them, you do so in row major fashion. This new syntax enables a column major input: `[1 2; 3 4]` is equivalent to `[1; 3;; 2; 4]`.

* When you're using spaces, it might feel "funny" to jump from one semicolon (which concatenates the rows in the first dimension) to three semicolons (which concatenates the matrices in the third dimension) in an expression like `[1 2; 3 4;;; 5 6; 7 8]`, but the key is that spaces first build rows and then the semicolons concatenate them along a particular dimension.

Anyhow, if you can expand on what's causing trouble, it'd be great to figure out how to improve the description in the manual.

It's really hard to say in what way I don't understand something that I'm not sure I even understand.

I think one approach is to maybe fully explain the column major approach, and then introduce the row major approach, and lastly how those two interact.

> [1 2; 3 4]` is equivalent to `[1; 3;; 2; 4]`.

is a hard to understand example.

Is the space equal to ;;? e.g., is [1 2; 3 4] the same as [1;;2;3;;4]?

The ;; is similar to the space, in that it separates elements in the second dimension. But they are not quite equivalent, because you cannot write [1;;2;3;;4]. With semicolons you have to start with the innermost dimension and work outwards. The space notation is there so that you can write the array in row-order, since that is very common.
There has got to be a way to sprinkle emojis into the documentation: https://github.com/under-Peter/OMEinsum.jl#learn-by-examples
Agreed, I also found that rather confusing.