| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by radford-neal 2708 days ago

Yes, there are problems of this nature in R, but I don't think the specific case you cite is really a problem. The ultimate source of the difficulty is the R (really S) decision to not have scalars - only vectors that happen to have length one. But given that, writing m[1,] is normally intended to get a simple vector, not a matrix.

The problem really arises when you write m[1:n,] with the intent of getting a matrix with n rows, and n happens to be 1 at the moment, so you get a simple vector instead.

This is a problem that I have addressed in my pqR version of R, available at pqR-project.org.

In pqR, there is a new sequence operator, .., which produces a 1D array, not a simple vector. And in pqR, when a 1D array is used as an index, the dimension is not dropped, even if the array happens to be of length 1. So m[1..n,] produces a matrix even if n is one.

Well, most of the time. There's also the problem that m might have only one column, so the result will get dropped down to a simple vector for that reason. To solve this, pqR has a new way of indicating a missing argument, with _, which also indicates that you don't want the dimension dropped. So you can now get exactly the behaviour desired by writing m[1..n,_].

This is all backwards compatible, except that it's necessary to disallow use of .. in the middle of an identifier, so that a..b won't be taken as the name of a variable.

1 comments

thom 2708 days ago

Ah, I've been intrigued by pqR. Lately I've wondered if there couldn't be a version of dplyr implemented as transducers, if only R... wasn't R. How feasible might it be for some future R runtime to be truly multithreaded, even if it breaks some existing functionality?

link

radford-neal 2707 days ago

Well, pqR already uses multiple threads automatically to parallelize some numerical operations - e.g., for long vectors a and b, (a * b + a / b) might be computed with three threads, one computing a*b, one computing a/b, and one adding the results of these as they become available, or exp(a) might be computed with two threads each handling part of a.

But if you mean threads programmed explicitly in R, with fine-grained, low-overhead communication using shared memory, I think it would be quite challenging to modify the current implementation to support a language extension to do this. But maybe not impossible, for some sorts of extensions.

link