Hacker News new | ask | show | jobs
by haberman 5190 days ago
I've written a few hundred lines of R sporadically over the last several years. The absolute worst thing about it in my opinion is the type system. It does not matter how many times I use R, I cannot for the life of me remember or understand the difference between vectors, arrays, lists, data frames, and matrices. A list is sort of like a mix between an array and a map, a matrix is sorta like a 2d vector but can have row/column names, an array is like a matrix but different, and data frame is like a heterogenous matrix. And converting between them is always tricky.

As much as R may be capable of, I just can't get past how inconsistent and complicated its basic types are.

1 comments

The terminology is weird. I'm not an R expert, but here's how I think of it:

vector: this one is clear based on the name; it's a homogeneous sequence (with very aggressive type conversion). A sequence of strings, a sequence of numerics, etc. One thing worth knowing is that there are no atomic types, so c(1) == 1. That is, the value 1 is identical to the singleton vector containing 1. Also the empty vector c() is identical to NULL! is.null(c()) == TRUE. Weird.

list: the name is confusing, but I think of it basically like a dict in Python. And the syntax is the same: list(a=1, b=2) vs dict(a=1, b=2). I think you can use it like a sequence as you are saying, but I never use them that way. Lists are for ad hoc composite types -- if I want to return 2 values from a function, I return a list() of them. I think you can convert lists to environments easily, or they are the same -- also similar to Python's dicts.

data frame: This is the core type AFAICT, it is basically a collection of named column vectors of the same length. e.g. data.frame(name=c("a", "b", "c"), value=c(1,2,3)). This seems pretty intuitive. A row has different types (like a DB relation) but the columns have the same type since a column is vector.

matrix: I don't use these too much, but it basically seems like a homogeneous type like vector, except you specify the dimensions.

array: I don't use this, but the R documentation says "A 2-dimensional array is the same thing as a matrix". So I think I am confused and what I typed above is an "array", and matrix is the special 2D case. Yes the names are bad. I think of a matrix as having arbitrary number of dimensions (e.g. in matlab).

I think where it gets confusing is that there are all these arbitary conversions. And you can use things more than the prescribed ways, so you might stumble across code that uses them wrong. But after a fair amount of R programming, there is my mental model, whether right or wrong :)

I think a lot of the mess comes from the fact that dealing with real data is just messy. R takes the mess and makes the common case convenient, and people like that. But it's like Perl in that it's a "Do what I mean" language and tries to guess a lot, rather than "Do what I say" like Python. And when it's guessing your intent wrong it can leave you very frustrated, as with Perl.

Hi chubot,

Two things:

1) A data.frame is in fact a list of vectors of the same length "compacted" together.

2) I find the types very "sensible" for a person doing statistics. But I guess (almost) everything makes sense once you get used to it...