Hacker News new | ask | show | jobs
by glofish 2382 days ago
R, unfortunately, is also one of the most ill-designed yet popular programming languages in existence. I would strongly recommend people to steer away from it. If you cherish your sanity stay away from using R!

Moreover after seeing what my colleagues publish as scientific R programs, I came to believe that science itself is bottlenecked by the large scale adoption by R and the sloppy, inconsitent and bug-infested programming practice that it encourages.

R does a few things well - cross-platform, plotting works on all platforms, packaging works well. But for actually programming it is atrocious.

13 comments

What is "actually programming"? Is fighting with your package manager (I'm looking at you python) "actually programming"? Is re-implementing functionality that exists elsewhere in the hipster language du jour (e.g rust, julia) "actually programming"? I totally concede that R itself is a fairly unremarkable lispy mostly functional programming language. What makes it stand out is it's emphasis on immutable, in-memory, array-based data structures. This means that 1) it's very straightforward to wrap highly performant C/C++/Fortran libraries 2) despite being a dynamically typed language, it's usually quite straightforward to reason about the type/shape of the inputs and outputs of a function 3) individual functions from one package can often be easily combined with functions from another package. I totally get it if any of this isn't your thing, but to write off a whole ecosystem as "ill-designed" (without literally any argument besides "my co-workers, and scientists in general, are stupid"), is pretty lazy.
Rust was developed out of necessity by C++ users, it is not a hipster language.

I prefer using Rust rather than paying hundreds of thousands of dollars on a static analysis tool for C++.

What are you on about? R has a specific use, which is statistics and data science. For those purposes it reins supreme. Even for developing dashboards, R-Shiny is a breeze compared to Pythons's Dash. R is awesome.

Also, to quote a comment[0] of yours from 2 days ago:

What you are saying is that since you prefer something everyone should do the same.

And what you prefer is the correct choice for everyone ...

0.https://news.ycombinator.com/item?id=21774917

There are many great things about R that you can point out - Shiny is not one of them.
In terms of integration of a first-rate statistics language with anything resembling a web interface, not only is Shiny great, but it pretty much reigns supreme.
What's wrong with Shiny? It's reactive. It's efficient. It's straight forward. It works well. It works with tons of great plotting libraries.

It's not perfect, but I'm struggling at finding bad things to say about it.

You quote a reply to a post that advocated for all education to be segregated. That means making a choice for others.

In this post, I make a recommendation of staying away from R. It is not even a remotely similar context.

R is a scripting language, most of the underlying infrastructure is written in Fortran, C and C++. R is also designed for stats, not writing software. Of course you're going to have a hard time if you treat it like a real programming language. That's why R provides easy interop with other languages.

But R also makes a lot of the tasks you do in data science far easier than it would be in a 'real language'.

that "easy" has a huge price - the language is choke-full of unexpected behaviors, inconsistencies and gotchas - all in the name of making it "easy"
Care to share any language that does things better without a steep learning curve? R is popular for very, very good reasons. You can pick it up and be productive with it in no time, even with the "gotchas"...
Raku has less of a learning curve than R, does things better, and is quicker and easier to pick up and be productive in, with barely any gotchas if any at all.

Though, to be fair, I like R. It has good plotting libraries, and as much as it gets bashed I like RStudio too. In comparison, Raku's ecosystem is brand new.

R is lisp variant with a ton of syntactic sugar and vectorization of the basic data types. Calling it a scripting language would indicate that you haven't liked at it much as a language.

Sure, most of the matrix operations and tight loops are implemented in FORTRAN or C++, but that's for performance reasons. The same would happen with Python, and I don't think it would be fair to call Python a scripting language either; it also has a ton of Lisp like qualities.

I find I spend most of my time debugging R, which is astoundingly difficult since it doesn’t report line numbers on errors. Most of my code is in C++, so that helps, albeit it’s still overly complicated to start up R in gdb. Amazingly Julia isn’t much better when it comes to error reporting either.

  # cat /tmp/test.R
  x <- 1:10
  y <- 1:20
  plot(x, y)

  # R --quiet
  > source("/tmp/test.R")
  Error in xy.coords(x, y, xlabel, ylabel, log) :
    'x' and 'y' lengths differ
  > traceback()
  8: stop("'x' and 'y' lengths differ")
  7: xy.coords(x, y, xlabel, ylabel, log)
  6: plot.default(x, y)
  5: plot(x, y) at test.R#3
  4: eval(ei, envir)
  3: eval(ei, envir)
  2: withVisible(eval(ei, envir))
  1: source("/tmp/test.R")
If you mean the lines in the packages you used, I think you'll see the lines if you build them with keep.source=TRUE
It's so crazy easy to start R in gdb:

  R -d gdb
that's it!
You might find this package useful https://github.com/robertzk/bettertrace

  options(error=recover)
All the design brief said was get started anything beyond that is a change request
What are you talking about? Of course it was written in lower-level, compiled languages. Is it any different than Python? Javascript? Perl?

As for the "easy iterop with other languages": [1]

So, how is R different from Python?

Also, it's not true that R wasn't designed for writing software. Even a critical "pamphlet" by Pat Burns [2] states otherwise. For writing statistical software, yes.

--

[1] https://wiki.python.org/moin/IntegratingPythonWithOtherLangu... [2] https://www.burns-stat.com/pages/Present/infernoishR_annotat...

This is silly. I bet your colleagues are making poor R programs because they are not well versed in programming, not because R is inheritanly worse than anything else.

My experience is that people who make bad R programs also make bad python programs. I don't think you should blame a tool for issues caused by the programmer.

try loading up any scientific package and you'll see how, in turn loads up other and other packages, in bioinformatics you can easily add up to dozens if not a hundred of dependencies, each written in R by people with questionable skills.

there is no escape.

When I said "colleagues" I really meant the entire scientific field runs on untold lines of buggy R code, so obtuse, so cryptic, that the task of debugging or even tracing what is going on is practically impossible. And you can't debug it because it is this awful R code everywhere! And when the code breaks it does not break like normal programming language do, with an error or exception or even a stack dump. No! Most of the time your R code will just start silently doing the wrong thing.

glofish, do you like functional programming paradigm languages like Lisp and Haskell? It's entirely possible that R is weird and unconventional giving a negative impression, due to it being FPP.

>try loading up any scientific package and you'll see how, in turn loads up other and other packages, in bioinformatics you can easily add up to dozens if not a hundred of dependencies

That's not necessarily a bad thing unless you're trying to run R on embedded or some other constrained environment.

>When I said "colleagues" I really meant the entire scientific field runs on untold lines of buggy R code, so obtuse, so cryptic

I can't recall the last time I've bumped into a bug in an R library. I'm sure they exist but thankfully the ecosystem is quite stable.

>that the task of debugging or even tracing what is going on is practically impossible.

Debugging in R is easier than most languages. I'm unsure where you're getting your facts from.

>And when the code breaks it does not break like normal programming language do, with an error or exception or even a stack dump. No! Most of the time your R code will just start silently doing the wrong thing.

It's no worse than Python in this regard. R isn't particularly bad in this area, but it's certainly no C++.

I'm going back to guessing it's because R is FPP. That's R's dirtiest and most offensive part to the uninitiated.

R is a great language with a powerful, but very dated standard library.

But don’t worry about R bringing science down - the scientific community can also write terrible python code.

This is why static analyzers, various linters, etc should become accessible and enforceable. Python or R, Julia or Fortran. It is time to force scientific developers to live in the same world the rest of development lives.
I strongly disagree. And in general, it seems like poor quality code in science is often not because of the language, but because scientists rarely lose their jobs when code breaks.

There are many books that cover how to develop in R in detail, and they are no less thorough than treatments of the subject in other languages (e.g. Hadley's books are as good as any I've read for python).

Many issues around inconsistency, etc, in language design (mostly how base functions / data types behave) have very clean, consistent implementations in libraries like rlang.

The main differences I see when comparing R vs python package code, that affect style are...

1. Most R operations are immutable.

2. R often uses single dispatch, rather than putting methods on a class object.

3. In R, vectorised behavior is often the norm.

4. R functions can choose to use lazy evaluation (it usually very clear when this happens in e.g. tidyverse packages).

These issues are covered in detail in books like Hadley's Advanced R.

Hadly Wickham is a hero of the language, single-handedly tries to wrestle the slippery monster into sanity, I think long term is a losing battle because in the end the language is still borked.

My hat off to him though!

As for the language ... consider this: lapply(), sapply(), tapply(), vapply() each does something different. The language allows two kinds of assignment operators even: a=1 or a <- 1 that are "almost" identical ... good luck, here is a language there are two ways to even assign a value to a name.

>The language allows two kinds of assignment operators even: a=1 or a <- 1 that are "almost" identical

The <- assignment is normal for functional programming languages. F#, OCaml, S, and more use this operator. This is because the arrow key used to be a physical key on keyboards back in the 70s when FPP was popular and brand new.

The = sign (function assignment operator) is function level scope and <- (assignment operator) is top level scope.

eg:

    median(x = 1:10)
    x  ## Error object 'x' not found

    median(y <- 1:10)
    y  ## [1] 5.5
So therefor,

    x <- 1:10
    median(x)
is equivalent to

    median(x <- 1:10)
It's a convenient feature the language supports. The alternative is how Python does while loops. If anything, R comes out above in this regard.

edit: Python has the := operator which functions the same way <- does in R. I guess Python is catching up on this one.

eg (Python):

    env_base = os.environ.get("PYTHONUSERBASE", None)
    if env_base:
        return env_base
vs

    if env_base := os.environ.get("PYTHONUSERBASE", None):
        return env_base
Note that `median((x = 10)); x` works fine :-)
> here is a language there are two ways to even assign a value to a name.

three ways if you count a <<- 1 for writing in global variables from inside functions (of course, not a recommended practice...)

I don't see what you point is though. So there are several ways to do the same thing in a language, so it's bad? Bad in what way?

As for the lapply, sapply, tapply, mapply, it's very well documented as to when and where you should use them. Sapply applies only on a single vector, and for generalization on larger data structures you use the other "applys". Nothing very hard to comprehend, and this is well explained in the official docs.

> single-handedly

There are many people in the R community working on this together (e.g. Jenny Bryan, Charlotte, etc).

> lapply(), sapply(), tapply(), vapply() each does something different.

The apply situation has been standardized through the purrr lib and dplyr for a long time. They are base library functions that aren't mandatory.

> two kinds of assignment operators even

Consider the custom of using <-. It reduces the kinds of assignment operators to 1. Similar to avoiding from lib import * in python. You can do it, but there are community standards against it.

This comes across as just another comment on why one language is "bad" when, as always, it all comes down to trade offs & preference when choosing a tool for a job.

The thing is, it's easy to write bug-ridden sloppy code in any language. Bemoaning R as a language because of these flaws, occurring due to rapid adoption, ignores the reasons why R has seen wide-scale adoption.

R has had an extreme democratizing effect on access to tools that facilitate data science. Previously, tools for data science were either massively expensive or had a prohibitively high price tag attached.

This means that many non-programmers are coming to R, and I maintain that the problems the parent post sees with R stem from that fact. As a result, any language that achieved that sort of layman (to programming) appeal would have the exact same bug or sloppy code fallout. That you cannot separate the momentum that led to such an accessible tool without have the same consequences. Rather than demonize the tool for this, we should recognize the positive dynamic at play and simply help guide users to better practices or improvements that would fix the issues.

I'm on the opposite site of the fence, but I'd love to hear some specifics on how R is ill-designed and encourages buggy programs.
I'm a heavy user of R, and I like using it a lot. But the language has lots of traps for beginners: code idioms that look correct but are subtly wrong. For example, if you need to iterate over the indices of a vector X, the obvious thing to do is 1:length(X), looks fine and works fine until you happen to pass a 0-length vector, and then it explodes. Similarly, the obvious way to select a subset of rows i and a subset of columns j from a matrix is X[i,j]. But that's wrong too, because if either i or j has length 1, you get a vector instead of a matrix. And I don't even remember off the top of my head what happens if either or both of i and j has length 0. The R Inferno[1] is essentially a big collection of cases like this.

None of this makes R a bad language, in my opinion. R is far from the only language with surprising edge cases like this. People say that R is designed for statistical analysis more than general programming, but I don't think that's exactly true either. Certainly it excels in writing code for statistical analysis, but I've used R a lot more than that, and I plan to continue. It's a perfectly fine general-use scripting language.

I think the real reason R gets such a bad reputation is that a lot of people writing and publishing R code aren't programmers by trade. And you know what? That's fine. Because I'd much rather work in a community that values and celebrates the publishing of code than one that shames people for releasing their code because it's "not good enough".

[1]: https://www.burns-stat.com/pages/Tutor/R_inferno.pdf

IMO the worst is accessing non-existent items in lists or when using the $ or [[ notation in data.frames: the fact that you get back NULL instead of an error breaks code in unexpected ways, and given that R's debug facilities are basically useless, makes it hard to debug complex code.
when indexing you can always pass `drop=FALSE` to prevent returning a vector. It will always return a matrix or data.frame.
Still - that's an excellent example of something that's broken by default and outright dangerous for production use and at the same time very convenient when using interactively. There are probably half a dozen similar other features.

The vast majority of packages are written by somebody taking their interactive session and tidying it up with some functions and tests and then publishing it. But going through and weeding out all these "broken by default for edge case" aspects is a nightmare.

Here are some fun links:

https://www.talyarkoni.org/blog/2012/06/08/r-the-master-trol...

https://www.burns-stat.com/pages/Tutor/R_inferno.pdf

and many others.

I have come to believe that the only people that think R is ok are those that are either:

- beginners that just passed the newbie state, learned a few tricks and feel empowered

- actual experts - that fully understand the minute details of the implementation and data models

I have been using R on and off for a decade, as soon as I stop using for a few months getting back is like a tar pit where I am continuously caught off guard by the myriad of ridiculous problems. Paradoxically as you get better with R your errors start becoming more dangerous, your code starts silently doing the wrong things.

R is unlike any other programming language that I have used before (also on an off) from Perl, C to Python and Java. None of these programming languages have such in incredibly obtuse and illogical and trippy design.

> I have come to believe that the only people that think R is ok are those that are either:

You can virtually say the same thing for every programming language that is made to be easy to learn by hiding complexity, like Python or Ruby.

> one of the most ill-designed yet popular programming languages in existence

What would the others be? Python is one, I guess.

Baseline JavaScript probably; that would certainly be my vote.

Also potentially PHP?

I had forgot about PHP!

I have no opinion on JavaScript.

I guess some people may include Perl. Others won't, either because they don't think it's ill-designed or they don't think it's popular :-)

python is not ill-designed desgined at all
Base R is bad; R augumented by other packages (e.g. tidyverse and data.table) is just as performant/easy-to-use, if not more, than other data science tools.
I'm not a language zealot (most languages/frameworks come with pros and cons) and I use R and the tidyverse quite regularly, but "performant" is not a word I'd associate with the tidyverse. I'd be surprised if it wasn't slower than most alternatives (can't claim to have systematically benchmarked it, still....), even if it's often easier to use and usually "good enough."
Ironically, I find students who rely on the tidyverse to be the most vulnerable to "This isn't working and I'll never be able to figure out why."
Base R is fine. I would much rather not include the tidyverse and all of its transitive dependencies.
> But for actually programming it is atrocious.

I really hope you are not using R for anything outside data science, physics, or other analysis. It was developed to do these things, not 'actual programming', which I imagine you define as creating some framework or application.

Most of the people that don't like R seem to want to use it outside of its use cases, and get frustrated when they fail.

> R does a few things well - cross-platform, plotting works on all platforms, packaging works well. But for actually programming it is atrocious.

Yet R is a lot more expressive than Python + Pandas for data related applications. It was never made as a universal language to develop any kind of applications, but it's pretty good at what it does with data manipulation.

It depends. R excels at backward compatibility and at interactive data analysis, which is what it's made for. But you're right in so far that you probably shouldn't use (much) R code in production.
I agree! That is what R was designed for. Puttering around in the R shell, slicing, dicing data live, doing some interactive plot this, plot that - alas that is not how R is used anymore
... that's exactly what I use R for. At work, I use it to filter data from the FAA database of registered aircraft. Or to poke around whatever CSV data I need some specific details from that day.

I thought that was what everyone was using it for. What are people using it for?

heh, try installing a single advanced package, you'll see immediately how hundreds of libraries interdependent libraries are also loaded and compiled, each full bugs and problems
"Hundreds" is as an exageration. You make it sound as if libraries in other languages were bug free. In my experience, most libraries work well and most authors respond rather quickly to requests.

Anyway, I think it's better to make use of a small, commonly used, and well tested library instead of reinventing the wheel again and again. Libraries are not one of my concerns I have with R.

load up a bioinformatics R library in bioconductor, see what happens, a few dozen would be the low estimate. And make no mistake each does fairly complex tasks.

now what if I told you that the majority (perhaps all) of these libraries you loaded and are needed to run the complex analyses in life sciences were all developed by people who are oblivious to proper software engineering. These were never meant to be used the way they are used - expose myriad of global variable names, methods etc.

You say you are using a small well-tested library with R, sure - but that is not what happens in science and for those that see what is going on, we know we're completely FKDd

The tragedy is that we cannot cure cancer as long we try to do it with R - and R is not going anywhere ...