Hacker News new | ask | show | jobs
by indeed30 1545 days ago
I love R more than any other language I have ever used. Perhaps more than any piece of software I've ever used. All of these points are valid, and yes, it's messy, and if you try to write the same type of code that you would in Python, it will frustrate you.

And yet.. it somehow works. It makes data analysis and statistical modelling a pleasure. It somehow gives off a sense of lightness, and makes it easy to investigate and explore. I would guess I am genuinely 2x as productive in R as I would be in Python on similar tasks.

I know it's not a "proper" language, but I think that, maybe, not everything has to be exactly like "proper" software engineering?

7 comments

I very much agree with this. I use python for (different types of) data analysis too, and in python in particular it feels like the "boilerplate" to "science" ratio is rather high in the direction of "boilerplate". R manages to abstract this away very effectively, as the article highlights.

The beauty of R is that you can write one line of code and use some hot-off-the-PhD-thesis cutting-edge-just-published-in-J.-Stat.-Soft-chunk of statistical analysis in your totally different, completely whacky problem, and it's fast, and (by and large) works.

Of course, that's its biggest problem as well. Scientifically, it will quite happily give you a 150 mm howitzer to aim at your foot, assuming you know best.

> hot-off-the-PhD-thesis cutting-edge-just-published-in-J.-Stat.-Soft-chunk of statistical analysis

I think you mean "poorly-documented-cobbled-together-under-deadlines-never-to-be-maintained by someone who has no idea of software principles". Very few labs have a dedicated software engineer to actually turn this software into a usable/hackable tool let alone maintain it.

thats an unnecessary negative stance. not every algorithm needs to be scalable and over optimized to be useful in most cases. and if something becomes really useful in R it ends up being reimplemented in more effe five ways down the road.
No, but it does need to be tested and reliable.
Coming from Matlab, I have the opposite feeling.

I truly, genuinely dislike the language. I think it's very productive, and I appreciate that Matlab costs an arm and a leg (and god help you once you start paying for some of the nicer packages on top) - but Matlab has spoiled me immensely on the language front.

To me, Matlab feels like a language that was designed with an intent to appeal to folks with some understanding of traditional procedural programming, but nudged into treating matrices as first class citizens.

R feels like a language that was built for people who were using excel, and have never written a line of code in their life - it's riddled with completely unintuitive, frustrating, intentionally obtuse operators and terms for things that have perfectly fine definitions in normal programming.

The difference is that I have 20+ years of programming experience (including quite a bit of functional programming) that I can easily port over to Matlab, and which becomes literal baggage trying to use R. The end result is that I will use R, but I basically always walk away frustrated and infuriated, even when the problem is solved.

> R feels like a language that was built for people who were using excel

The S language predates the first release of Excel by 11 years.

> and which becomes literal baggage trying to use R

I've had the opposite experience. My experience was that having a broad array of programming experience made it easier to pick up the weirder corners of R. It became more likely that I'd seen *something* similar to that construct in the past. The converse has also been true. Seeing all the weird corners in R has made it easier to pick up new concepts in other languages & paradigms as it's been more likely I've seen *something* similar from R.

Using pipes and tidyverse/data.table allows for great things in R, and has a strong functional feel. It can be quite beautiful reshaping data, splitting, map, recombining and plotting it.

It doesn't go well at all with a procedural method.

> R feels like a language that was built for people who were using excel,

I don't think so. Most people who come to R after years of Excel find it just as alien as you do.

I recall when the pipe operator was first being proposed the argument for it was that it'd enable workflows that felt more like Excel. The implication being that indeed, base R is alien to an Excel user.

I also recall my pushback was along the lines of "who on earth would want that". Yeah, it's a good thing I'm not the person coming up with these things :)

> I recall when the pipe operator was first being proposed the argument for it was that it'd enable workflows that felt more like Excel.

Where are you getting that from? To start with the pipe operator has been independently reinvented multiple times in R, and neither ‘magrittr’ nor ‘dplyr’ were the first to introduce the pipe operator into R. And (at least when I was exposed to it), the pipe operator had nothing whatsoever to do with Excel. Instead, it was an attempt to introduce the composability concepts from the UNIX shell and Haskell composition into R.

You have me second guessing myself, that perhaps I’m conflating it with the convo around dplyr in it’s early days

EDIT: I found the conversation in question but it involved deleted tweets. And those deleted tweets are the one that reference the package name. Sigh. It was just after the release of magrittr and several months after dplyr

> I recall when the pipe operator was first being proposed the argument for it was that it'd enable workflows that felt more like Excel.

I have no idea where you get that impression, most Excel power users I have met take a long time to understand how to use the pipe operator in R.

How do you feel about the pipe operator these days?
I haven't used R enough in the last 10 years to have an R-specific opinion. And to be honest it was more an unlearned statement on my part as it was an "ew, Excel" response and not thinking about the underlying workflow.

In the intervening time I've become a large advocate for the pattern of chained operators. So I'd imagine I'd enjoy piping in R. And if that means I'm emulating a common Excel workflow, that's fine. I won't have the childish response of "ew, Excel" :)

100% this :)
Ha ha, I love that this is your only comment here! Thanks for all your work on R.

I came here with sleeves rolled up to defend the language, but was pleasantly surprised to find it was already being done much better than I could have.

It's interesting to see how R elicits such a reaction to some programmers. I think it's frequently misunderstood, and R needs to be used in a particular way to allow it to fly.

When I've tried to recreate analyses in Python or Julia, they have nowhere near the fluency of R. It isn't possible to know this if you're messing around with if statements and other procedural methods of achieving things which are better suited to other languages, but rather when crunching data for analysis and graphically visualising the results.

I also understand that it's due to R's lisp-y-ness that allows us to have tidyverse in the first place.

Question for Hadley - there have been a couple of projects to fuse the speed of data.table and tidyverse. What do you think of this aim and are you tempted to change tidyverse to get to the speeds of data.table, or would that require too much of a fundamental change?

Have you seen https://dtplyr.tidyverse.org? It gives you the syntax of dtplyr and (almost all of) the speed of data.table.
Oh, I feel a bit silly now! I'd seen a couple of attempts to combine the two, but didn't realise this one was official. It looks great. I have an analysis coming up that borders between dplyr and data.table in terms of size, so will check it out then!
There is also the tidytable package. But dtplyr works really well. Have used it in a couple of shiny apps that wrangle some heavy input files.
Before downvoting this short one liner make sure you to check who wrote it!
Thanks for all your work on the tidyverse!
Not a lot of people would realize that Hadley is a minor celebrity in many scientific fields due to Tidyverse! Thanks for all the work.
Bravo!
Yeah, and the growing user base, widening ecosystem, and continual stream of analysis packages being written only in R suggests that lots of others agree.

An important factor not often mentioned is that I think R really helps individual developers/very small teams to be productive.

I feel the exact same way! I've used R for the past decade. Once you learn the philosophy behind it, it just works. Yesterday my boss asked me a question about a dataset and I wrote code to analyze it while talking through the problem in real time.
The main issue I've had is speed. As soon as you have problems that can't be vectorized, models that take 30 hours to run in R take 30 minutes in python.
In my limited experience, problems that cannot be vectorized really shouldn't be written in python either (assuming you mean python loops). But indeed the edge that Python has is the ease of use of drop-in solutions like Numba allowing you to continue to write in Python but not Python
Mind giving an example ? The only time I faced this was due to an autoregressive model, which was super easy to delegate to c++.

I've been working with Python for the last year and appreciate how much it helps with general IT problems, but I would still stick to R for statistical/data analysis.

Example, please?

This seems highly unlikely, based on my 20+ years with R. Yes, using wrong data structures/algorithms can lead to slow code, but switching languages won't fix this.

rprof and microbenchmark are your friends if you really need to optimize your code.

and (as in python, and as several others have pointed out), if you have something especially challenging, write it in C/C++/fortran instead, and link it to R.

In both languages, you can write/use C extensions.
you can insert C code very easily in R for when you need more speed.
Thanks for expressing how I feel about R so succinctly.