Hacker News new | ask | show | jobs
by notafraudster 1798 days ago
What's the argument against using R and dropping into RCpp for very limited tasks? I (helped) write a very widely used R modelling package and while I wasn't doing anything on the numerical side, we seemed to get great performance from this approach -- and workflow-wise it wasn't too dissimilar to 25 years ago where I had to occasionally drop in X86 assembly to speed up C code!

(Not a hater of Julia at all, very much think it's a cool language and an increasingly vibrant ecosystem and have been consistently impressed when Julia devs have spoke at events I've attended)

6 comments

Interoperability between libraries that expect your code to be pure R / pure Python. If you use RCpp or Cython or CPython then you lose much of the magic behind the language that enables the cool (but frequently slow) features. My biggest pain point in this situation: you can not use SciPy or Pillow or Cython code with Jax/Pytorch/Tensorflow (except in very limited fashion).

Differential equation solvers that need to take a very custom function that works on fancy R/Python objects is another example of clumsiness in these drop-to-C-for-speed languages. It works and as a performance-nerd I enjoy writing such code, but it is clumsy.

That type interoperability is trivial in Julia.

Once your Rcpp code is compiled, it's almost indistinguishable from base R (when you're calling it). All R functions eventually end up calling R primitives written in C, and Rcpp just simplifies the process of writing and linking C/C++ code into the R interpreter.

The only difficulty with Rcpp-based R packages is you have to ensure the target system can compile the code, which means having a suitable compiler available.

I wonder how much does it differ from python's use of C or Cython (I have only superficial R skills). The prototypical example of why Python's C prevents interoperability is how the introspection needed by Jax or Tensorflow (e.g. for automatic GPU usage or automatic differentiation) fails when working on Scipy functions implemented in C.

For instance, I imagine there is an R library that makes it easy to automatically run R code on a GPU. Can that library also work with Rcpp functions?

> it's almost indistinguishable from base R (when you're calling it).

I am very surprised by this. Given how R is extremely dynamic. and has things like lazy-evaluation, that you can rewrite before it is called with substitute. Which I am sure some packages are using in scary and beautiful ways.

I think the argument is that most R users don't know C++. So Julia avoids the "2 language problem" that you get with modern scientific computing.
Not much an argument at all, if you ask me. There's definitely a benefit to only having to learn a single language (rather than R and C++), but the library/package ecosystem in R is hard to beat; unless you're doing truly bespoke computational work, the number of mature statistical libraries/packages in R is unmatched. Rcpp's syntactic sugar means most slow R bottlenecks can be written in C++ almost verboten, but without the interpreted performance penalty. One of R's best and under-emphasized features is its straightforward foreign-function interface: it's easy to creating bindings to C/C++/Fortran routines (and Rust support is coming along as well).

I've been impressed with Julia, but it's hard to beat 25 years of package development.

Same argument as python.

In other words, you can (empirically) get a lot done that way, but there is always friction.

One thing I don't like about the two langauge approach - deployment story get's more complicated it seems?

In my case I went to deploy on a musl system and things with the two language just were a pain to get up and running.

Conversely, everything that was native python ran fine in a musl based python container.

Your native python code just moves also nicely between windows / linux / etc

Development story is as complicated as the tooling makes it to be. With good tooling that e.g. minimizes the amount of glue code and/or makes integration of building the native parts easy, the development story doesn't have to much more complicated.
I think that's just being clumped in with "Use C++," which he mentioned as an option
and the ffi adds a lot of overhead for granular data. Julia just works fast. My only friction has been offline development, which isn't well supported yet.