Hacker News new | ask | show | jobs
by 12423gsd 4224 days ago
The RStudio guys have really made R a pleasure to use. Thank you guys!

The core language is still a confusing mess (I'm still never sure when to use a matrix, a dataframe, a list..), but if you use their tools you can ignore it for the most part.

In under 10 lines you can massage data and generate fantastic graphics.

A little off topic: but does anyone know what their business model is? Are they going to run out of money and burnout in a year or two?

4 comments

If you're confused about R's data structures, please read http://adv-r.had.co.nz/Data-structures.html and let me know if it doesn't help.

And no, we're not planning on burning out. We currently sell three things:

* RStudio Server Pro. An commercial version of the open-source server version that provides stuff that corporate IT wants (e.g. monitoring, more auth options, ...)

* Shiny Server Pro. A more flexible version of the open-source shiny server that offers more configurability (e.g. number of R processes per app), and again other stuff that corporate IT wants.

* Right to use the RStudio desktop IDE to companies who don't want to use AGPL software

Here's how I think of it, which has been working for me:

matrix - If you have data that would make sense to be in a spreadsheet-type format and all your data are numbers.

dataframe - If you have data that would make sense to be in a spreadsheet-type format and some columns are numbers but other columns are something else (character strings, dates, TRUE/FALSE); but each column is only one thing. That is, you have one column that's all dates, another column that's all numbers, yet another column that's all character strings, etc.

list - if you need to mix data types within a certain entity (vector or column of data).

Unless you're doing linear algebra (or really care about memory usage), you almost never need to use a matrix in R.
To piggyback on what hadley said a bit, I find thinking of a data frame as a "collection of records", and a matrix as "two dimensional data" to be a bit better.

One useful heuristic worth asking is "Does it make sense to sort this data by something". In that case, you have a data frame. Whereas if you want to perform matrix math on something (inverting it, multiplying it by another matrix, reducing it, etc.), you have a matrix. Things that I use a matrix for can generally also be expressed as a data frame with columns rowId, colId, and value. If it doesn't make sense in that format, a matrix is generally not the appropriate structure.

That's a great explanation! Data frame for data analysis; matrix for math.
I'd amend that a little: use a matrix when you're actually calculating statistics (internally to the function). Clean your data so it always fits in a data frame when you load it. Lists are for representing things like data scraped from html before converting it to a data frame.
It's always great when you spend 10 hours trying to debug something and then find out from a mailing list that it's actually a bug in R. :(
Business model is sell to enterprise and consulting: http://www.rstudio.com/pricing/
FWIW we don't do any consulting, although we do a decent amount of training.