Hacker News new | ask | show | jobs
by Decabytes 2330 days ago
My guess would be leveraging all the libraries and packages in R, without having to rewrite them in D.
1 comments

Here's what I'm not getting:

Most the tasks that require heavy computation in R are done through C/C++/Fortran APIs - why can't D interface with them without the intermediate R layer?

The ones that are R-native - visualisation/dataframes - are best done through R itself - what's the use case for doing that through D at all?

The intermediate R layer is often a well thought out abstraction - hence it makes a lot of sense to re-use it. At different points in my career I have written skunkworks wrappers for rpart (decision tree learner) and glmnet (generalized linear models with elasticnet). While its true that some subset of the features of these libraries exist in other languages (my primary working language is Python), these are not as feature-rich. To consider the first example, rpart offers the concept of "surrogate splits"[1] that lacks in scikit decision trees. Also scikit doesn't support categorical features (you need to encode them into one-hot vectors), and rpart does.

In short, the intermediate layers often give you well thought out features.

In some cases, you might not even have a corresponding library in your language. For ex, if you wanted to use interaction terms in your linear model, that respects hierarchies, there aren't many options around, but R has glinternet [2].

[1] https://stats.stackexchange.com/questions/50310/how-does-rpa...

[2] https://cran.r-project.org/web/packages/glinternet/index.htm...

I can think of a few different use cases:

1) Use a library written in raw R that is difficult to implement in another language. It can be time-consuming to replace the work done in raw R.

2) Use a library that relies on a C/C++/Fortran API, but then provides additional error checking or providing additional calculations using raw R on top of it. It can be time-consuming to replace the work done in raw R.

3) Use a library where the R implementation has an ecosystem built around it. It can be time-consuming to replace the R ecosystem.

4) One way that I used this in the past was with Stan. Stan is written in C++, but they do not provide a C++ API and you either call it at the command line with cmdstan or use an interface like rstan or pystan. The problem with calling it at the command line is that you have to write the data to a hard disk first. You don't have to do that with rstan. My recollection was that it was faster to call rstan from D than using cmdstan for large data sets. So basically, I can process all the data in D, which tends to be faster than R or Python, and then pass it off to Stan.

5) When attempting to produce a D library with similar features as an R package, one can create a unit test that calls the R version and check that they provide the same results.

From the article:

>This article shows how to embed an R interpreter inside a D program, pass data between the two languages, execute arbitrary R code from within a D program, and call the R interface to C, C++, and Fortran libraries from D.

> Most the tasks that require heavy computation in R are done through C/C++/Fortran APIs - why can't D interface with them without the intermediate R layer?

You absolutely can do that, but it's generally not a fun experience. Most of the time the overhead of calling into R is trivial. In that case, take advantage of the convenience of R and get on to other things. There's also a lot of pure R code that you don't have any other way to call.

The matrix algebra and array manipulation is written in C++. The statistical analysis is done in R, potentially using multiple different matrix algebra routines. For example.

I have no idea why you would not use R (or python) «at the top», though. R might not have great libraries for network protocols (REST, etc), as it’s not a general language, but more stats oriented.

> R might not have great libraries for network protocols (REST, etc), as it’s not a general language

No idea how it compares to D, but REST in R is pretty straightforward through the httr library. And in general the "general purpose" aspect of the language is pretty good.

If you're doing a simulation, for instance, and you're missing a piece of functionality available in R, you can bring in just that piece. D's a nice language, and if you decide you want to use it, it's great to know that every line of R code you've ever written and every R library you've ever called is still available.