Hacker News new | ask | show | jobs
by slver 1851 days ago
Is an array language a language where you only have arrays? That honestly sounds a bit odd.
4 comments

No, an array language is one in which all the built-in operations are applicable to arrays. Take addition as an example:

        4 + 4
    8
In array languages the same operator can be used for arrays; or equivalently you can say that the example above sums two arrays of length 1. In J, you can do this and expect it to work:

       4 4 4 + 2 2 2
    6 6 6
This is true for all the built-ins, and many user defined operations (as long as you don't fiddle with so called rank of the verb you're defining).

Numpy is close to that, but there's still a distinction between arrays and scalars, while in array langauges that distinction is often blurred:

       4 + 1 2 3
    5 6 7
Edit: in J, you can have atoms, or scalars, but you need to box them:

       (3;4;5)
    ┌─┬─┬─┐
    │3│4│5│
    └─┴─┴─┘
but then you can't do anything with them until you unbox them again:

       (3;4;5) + 3
    |domain error
    |   (3;4;5)    +3

       (+&4) each (3;4;5)
    ┌─┬─┬─┐
    │7│8│9│
    └─┴─┴─┘
(Examples straight from J REPL)
It seems almost as if it'd be more useful to not explicitly expose operators as applicable to arrays, but implement SIMD optimization for operator expressions in .map(function) and something like .binaryMap(right, function) ex.

    [1, 2, 3].binaryMap([10, 10, 10], (a, b) => a * b) // Produces [10, 20, 30]
This would be easier to optimize when compiled because the expressions can be simplified and mapped to the right SIMD instructions.
Unfortunately, I have no idea about the implementation and what optimizations are done in J or APL for which operations :( I know that there are hardcoded "fast path" expressions (particular combinations of operations) which have much better performance than more general expressions doing the same thing, so it might be that the optimization happens at that level.

OTOH, your example is very verbose when compared to J's version:

       1 2 3 * 10
    10 20 30
Plus, in J it generalizes to higher dimensional arrays:

       i. 3 3
    0 1 2
    3 4 5
    6 7 8
       (1 + i. 3 3) * 10
    10 20 30
    40 50 60
    70 80 90
My example is verbose in order to clearly communicate the principle. I can trivially shorten it to

    [1, 2, 3].map(a => a * 10)
    
if I wanted to literally carry out that task alone.

    [[0, 1, 2],
     [3, 4, 5],
     [6, 7, 8]].flatMap(a => a * 10)
I think this is pseudo-code? `flatMap` and `=>` for lambdas look like Scala, but there are no array literals using `[]` there. Assuming you mean Scala-like semantics, your second example wouldn't work at all:

    scala> Array(Array(1,2),
                 Array(4,5)).flatMap(a => a * 10)
                                                       ^
           error: value * is not a member of Array[Int]
You would need to write it like this:

    scala> Array(Array(1,2),
                 Array(4,5)).flatMap(a => a.map(x => x * 10))
But then you'd get a flattened array of ints, not array of arrays of ints:

    res6: Array[Int] = Array(10, 20, 40, 50)
So to get the same result as

    (i. 2 2) * 10
You'd need to write:

    scala> Array(Array(1,2),
                 Array(4,5)).map(a => a.map(x => x * 10))
    res7: Array[Array[Int]] = Array(Array(10, 20), Array(40, 50))
...which is more verbose, no? :) EDIT: obviously, I mean "map in map" part, not the Array initialization!

The problem with list comprehensions and `map`, `filter` and friends is that they work very well for flat lists, or lists of lists in some specific circumstances (ie. when you can use `flatMap`). 2D arrays in the general case, and arrays with higher dimensions are really hard to work with using these primitives, unless you have some clever overloads, like what Clojure does. I think Haskell also has a solution for this, but I don't know it enough to comment further, unfortunately :)

What I can't understand is what would J do if I tell it to sum these two arrays:

    1 2 3 4 5
    6 7 8 
I.e. 5 elements and 3 elements
even less verbose:

(>: i. 3 3) * 10

...or

10 * >: i. 3 3

FWIW

That just removes a tiny tiny bit of dynamic dispatch overhead. Which is still needed anyways, as array languages can often dynamically switch between 1-bit, 8-bit, 16-bit, 32-bit integer (and 64-bit float) arrays, depending on the elements, completely transparently to the user.
Most compilers can inline statically defined closures in these contexts. And tracing JITs do this even when the closure is not define statically (but is stable).

It's more about allowing the SIMD goodness without the ambiguity and restrictions of "scalar operators work on arrays" implemented naively.

> without the ambiguity

There's no ambiguity! In J, all (with some exceptions) operations work on arrays, period. `1 + 1` is not an addition of two scalars, but instead of two arrays of length 1. As I mentioned, you can have scalars, but a) they still live in arrays and b) you can't do anything with them without unboxing. So there's no ambiguity, as far as I can tell.

Also, APL and J are implemented as intepreters, but that doesn't preclude using SIMD instructions when executing expressions. I'm 100% sure the operations are not "implemented naively" :)

> `1 + 1` is not an addition of two scalars, but instead of two arrays of length 1

I think that's true of (Dyalog) APL, but not of J [1]. APL follows the nested array model (that you describe), while J follows the flat array model that does have true scalars.

[1] https://aplwiki.com/wiki/Array_model#Flat_array_theory

Traditional array languages (APL,J) are not amenable to static compilation due to other dynamic issues, though. I think Dyalog APL is experimenting with a bytecode interpreter, but no JITs in sight either (that I know of). The very dynamic aspect of those languages makes that difficult. I'd love a compileable similar language, though. To replace R/Python for statistics that are sooooo annoying when exploring data due to verbosity.
That's not really the issue.

It's certainly true that, depending on how the code is written, you may not be able to optimize out the language implementation from the compiled program. But that just turns into a link against a library with support for the language for your hypothetical compiled program.

That said, the compiler being hypothetical is a very real obstacle. But even there the problem is not that the language can't be compiled -- it's that no one has bothered to implement a compiler for it.

Anyways, if you throw in some ML type inference, and some tools for characterizing the resulting code and its interfaces, you can generate code from an array language which is quite similar to the code you would get from a variety of other languages.

A compiler could inline SIMD with or without an operation having an explicit map, the dynamic type checks needed are gonna be the same. (and, since this is an array language, the tiny extra overhead of completely dynamic dispatch is gonna be small compared to the executed SIMD afterwards anyways)

You lose a lot of simplicity & brevity by requiring explicit mapping, and gain close to nothing.

That's the case in APL and J. K uses nested lists to represent arrays, and has non-lists (atoms). But the convention is that an n-times nested list is considered an n-dimensional array so even an atom is an array, with 0 dimensions.

There's a page on the various approaches to arrays in the APL family at https://aplwiki.com/wiki/Array_model .

Mostly yes.

Numbers (and character) are implemented as arrays with 0 dimensions. Text would be an array of characters with 1 dimension (the number of characters), and generally speaking the dimension of an array is a one dimensional list of non-negative integers. Many array languages also include an array type which is approximately the same as a C pointer to an array, with a bit of jargon thrown in to distinguish a reference to an array from the array itself.

Something like an SQL table in an array language would be implemented as a list of columns (rather than as a list of rows) and a corresponding list of column labels. This has some interesting benefits.

That said, functions in array language are typically not arrays (though they presumably would have a textual representation). So... not everything is an array.

From the J docs: https://www.jsoftware.com/help/dictionary/dx005.htm

  nub=: (i.@# = i.~) # ]
     5!:2 <'nub'
  +-------------------+-+-+
  |+--------+-+------+|#|]|
  ||+--+-+-+|=|+--+-+|| | |
  |||i.|@|#|| ||i.|~||| | |
  ||+--+-+-+| |+--+-+|| | |
  |+--------+-+------+| | |
  +-------------------+-+-+
... so J functions have an array representation, at least.
Yes, each J function has several array representations.
You could consider Pandas / numpy an array language, even though it's really a library for Python. The exposed API is IMHO what's important.
I agree, but you have to take extensibility into account. You either have an extensible language and extend it[0], you make it an array language to the core, or you just have a few slightly convenient functions (not a language). (I don't have enough experience to judge which category Pandas / numpy are in)

[0]: https://github.com/phantomics/april