Hacker News new | ask | show | jobs
by ltjohnson 5589 days ago
I am a statistician that does both research and applied work.

I use R for three reasons: (1) It's Free Software; (2) It's a programming language; (3) Other statisticians use it so it's easier for me to collaborate.

There are the usual supporting arguments for (1). (2), I've only used SAS a little bit, and it was extremely unpleasant to use it for non-built-in stuff, which makes research harder for no good reason. For (3), I have nothing against Python but most other statisticians don't use it. If I want to share my work in R, it's easy (statisticians know how to install R packages). If I want to share my work in Python, I first have to teach [most] other statisticians how to use Python. There's nothing wrong with that, but why raise the start-up cost for them?

tl;dr I conjecture that most statisticians don't want what the author is suggesting. Also, there are plenty of companies that are trying to do what the author is asking for, but most of them seem to miss the desired sweet spot, or charge lots of money, or both. I haven't taken a survey of the available software in quite some time.

1 comments

can you reccomend a book to get started with r?
No, but I can give some suggestions. It would help to know what you want to do.

First of all, you need to decide if you want a language reference, or an application guide, as R books fall into those two categories.

If you have a specific type of work in mind (bio-informatics, data mining, data visualization, ...) I'd say to find a book that focuses on that topic. I haven't looked in a while, but I haven't seen a general R book that I like, anything I suggest there would be guessing on my part.

There are plenty of good references on the web. I'd start by looking at the material available from the R web site:

R's core manuals [1] are typically correct and reasonable to use. The "Introduction to R" guide will get you up to speed fairly well if you already know another programming language. There is also the contributed documentation [2]. I haven't gone through these, so I can't say much about them, or promise that they are up-to-date. I suspect not, as R develops rapidly. The one reference I can recommend highly is "The R Inferno" by Patrick Burns [3]. This is not a starter guide, but something you read after one. It gives excellent advice on avoiding common pitfalls in R.

[1] http://cran.r-project.org/manuals.html

[2] http://cran.r-project.org/other-docs.html

[3] http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

Thanks. I do biology with limited amount of data and my needs are very basic. Here is a software I wrote to do sleep analysis in Drosophila: http://www.pysolo.net

So far I could satisfy most of my statistics needs with the function in numpy and scipy but occasionally I need to do something slightly more fancy and R I guess is the way to go.

Possibly. R is really great at doing "fancy" statistical analyses. It's very lousy at doing things like text manipulation. When I have a project that needs some text manipulation on the front end, I frequently use other tools (Python, vi, sed, ...) on the front end to beat text data into a nicer form for R. I couldn't say without knowing more about your project.
I always seem to come back to "Introductory Statistics With R".[1] It gives a lot of examples of how to do "the day-to-day stuff". Also, since, as the title suggests, the statistical contents are mostly (very) introductory in nature, it's really easy for me as a reader to decipher what's going on in each example- it's easy to tell which parts are specific to the example itself and which parts are generic to R, if that makes any sense.

[1] http://www.powells.com/biblio/65-9780387790534-0

Here's a site I always go to for reference: http://statmethods.net/
If you really want a book I would recommend "Data Analysis and Graphics Using R" by Maindonald and Braun, http://books.google.com/books?id=d7OeVD6SKBsC.