Hacker News new | ask | show | jobs
by nathell 3778 days ago
I have a love-hate relationship with R, being a predominantly Clojure (and Ruby these days) programmer who only occassionally dabbles in data crunching.

The apply/sapply dichotomy that the article mentions (actually a hexachotomy, there are also mapply, sapply, tapply and vapply) is one example of a gazillion warts that the language has.

Another random one: R has a useful function, paste, that concatenates strings together. Only it takes varargs, not a character vector, so if you have a vector v of strings, you have to use do.call(paste, v). Only not, because do.call insists that its second argument be a list, not a vector, so you do do.call(paste, as.list(v)). And if you want to separate the strings, say, by commas, you have to affix the named argument sep, obtaining do.call(paste, c(as.list(v), sep=",")).

And R's three mutually incompatible object systems. And so on and so on and so on.

There are things to love. The packaging system works really well. I like the focus that R puts on documentation: hardly anywhere is it so comprehensive, with vignettes and all. There are things plainly inspired by Lisp (R is just about the only non-Lisp I know that has a condition and restart system akin to CL). And ggplot2 is one hell of a gem of API.

In many ways, R is the PHP of data science. (Though the core language's still nowhere near as abysmal as PHP.) Despite all the warts, there are all sorts of statistical analyses that are just a package.install() away. Put another way, R is to data science what LaTeX is to typesetting. It's a heavy pile of ducttape, but it's here to stay because it's just so damn useful.

3 comments

See, this is another interesting example of the kind of behavior described here: https://news.ycombinator.com/item?id=11113042

People who don't take the time to learn the language are having to go through these contortions to make R work the way their favorite language works, rather than just taking the time to learn how R works!

    R has a useful function, paste, that concatenates strings together.
    Only it takes varargs, not a character vector, so if you have a
    vector v of strings, you have to use do.call(paste, v).
But the help for the `paste` function literally goes over this exact situation:

    > v <- 1:5
    > paste(v, collapse = ",")
    [1] "1,2,3,4,5"
I'm often super baffled by the lengths people will go to not figure out how to use R and insist on writing <X> language in R.
Thanks for pointing this out. I overlooked it, presumably because it's in the last paragraph of "Details" and not illustrated in any example.

I still maintain there's a wart in what I'd described, which is `do.call` not accepting vectors as the second argument. Also, `collapse` is idiosyncratic: I have to remember a special knob for every function that has a vararg and non-vararg flavour.

You raise the point of taking the time to learn the language, and I acknowledge this. Yet, as an occassional user, this is precisely what I'd like to avoid. When working with R, I'm pragmatic: what I'm after is a working solution to the problem at hand, rather than its most succinct or elegant formulation. When I find one, I move on. In production code this would incur a technical debt, but due to R's exploratory nature, this is typically not much of a problem. Had the language been more consistent, it would take less time to learn it thoroughly.

You might enjoy purrr, https://github.com/hadley/purrr, which is my attempt to make FP tools in R more consistent.
I wonder what disagreements or counterarguments downvoters might have, and if they might share them.

For the parent, given what you've observed, do you still go to R for data crunching, or have you found anything in Clojure land that measures up?

R. Or a mixed approach, with Clojure for data preprocessing and R for the analysis proper. Case in point: I wrote the Clojure scraping library, Skyscraper [1], and made it output CSV by default so as to be able to easily drop the resulting files to R.

For statistics, Clojure has Incanter, but it's very basic in comparison. There are easily usable Java libraries for certain tasks (MALLET comes to mind), but these are few and far between.

[1]: https://github.com/nathell/skyscraper/

You can also use paste( v, collapse =",")