Hacker News new | ask | show | jobs
by B1gred 3862 days ago
I program in R 80% of my day. I have experiences in all the major alternatives but keep returning to R. It has one huge flaw, being slow but otherwise is fantastic to work with and has a vibrant community.

The bigger issue is that while R is liked by statisticians it lacks many of the features for the software development. We run across difficulties with logging, version control of packages, speed, size of docker image, build time etc. But, with these drawbacks I keep coming back because I develop faster and better in R.

2 comments

I agree, but this stuff is getting better. I'm actually considering breaking away from the Rocker-derived stuff because the images get so big I'm pretty sure I could maintain a faster build myself. Problem is I haven't used R locally in Linux for a long time and the split off between things in the OS package manager and R can be a bit tricky with dependency management.

packrat has helped a lot with version control of packages, but it still doesn't quite feel like the right solution.

I've been really impressed in the last 5 years how far R has come in these areas though, so like you I keep coming back. By the time I start getting over the learning curve other places, R seems to have developed better tooling for what I want to accomplish anyway and I can come back and right cleaner, clearer, better software faster in R.

I agree with your assertion that R is slow, yet quick to develop in.

I recently had to loop through 1.3Gb of data (5000 files) and merge just one column from each file into a new dataset. It did so in ~2 hours. Yet the loop was just ~5 lines of code.

This task sounds almost uniquely poorly suited for R, but this has gotten better. For example, adding a column (did you append to the right or do an actual merge/join?) used to require copying the previous table but doesn't any more.

I wonder if you tried doing things like:

* preallocate a list, then do.call(cbind, your_data) * Same as above, but with some of the faster alternatives to cbind like dplyr::bind_cols or data.table::cbind * Use data.table, which has far faster joins than base R (so does dplyr) if you were doing a true merge/join

If it was truly just adding a column rom each file together into a file, these kinds of tasks are much better using UNIX tools, in my experience.

It is slow. And it is ok. Very few times will R ever beat any other language. Usually it is not off by much, but especially if coded by a novice using for loops vs apply functions can make is 100 -1000 x slower.

Another example is the immutable structure that causes R to be a memory hog. Creating copies of data everywhere. But, again if you plan well and execute the 'best' solutions you can avoid the giant pitfalls but will rarely ever beat a equally well written python equivalent.

Post R 3.1 there are far fewer deep copies (e.g. modifying a list or adding a column to a data.frame no longer copies the whole thing like it used to).