Hacker News new | ask | show | jobs
by dmlorenzetti 3778 days ago
The upshot is that unless you carefully read the apply() documentation..., you’re hosed.

One thing that jumps out at me, having returned to R after several years in the Python world, is how obtuse its documentation can be.

The standard format for R documentation does a few things that I find impede understanding. First, the help pages are organized into sections giving the high-level description, the arguments, the details, and the results ("values"). The "details" generally are organized by argument keyword, and the arguments section draws on the language laid down-- usually in a vague, high-level way-- by the description section. Finally the practical effects of the details are deferred till the results section. That means unless you already know what's going on, you end up having to jump around among sections, trying to synthesize everything.

This is particularly a problem for those help pages-- and there are a lot of them-- that describe a raft of related functions all at the same time. Describing a bunch of related functions in the same place sounds like a good idea (it should help you figure out `apply` vs `sapply`, right?). Yet this is exactly when the documentation organization results in the most scattershot reading, because in addition to having to synthesize between sections, you have to mentally prune away text that, for one reason or another, doesn't apply to your particular case (for example, because different functions don't all share the same arguments, or because you want to read about the values for just one variation on the function).

Another idiom I dislike in the standard R documentation is how the examples don't actually show any sample output. There are generally some attempts at comments to explain what the sample code should or shouldn't do, but they are very much written in the style of programmer's comments, not in the style of documentation or learning points. So you end up having to run the code, and sometimes puzzle over the results for a while.

Here's an example, from the help page that I happen to have open right now, `help(sample)`:

     # sample()'s surprise -- example
     x <- 1:10
         sample(x[x >  8]) # length 2
         sample(x[x >  9]) # oops -- length 10!
         sample(x[x > 10]) # length 0
The comments alert me that there's a "surprise" in store, and they even allude to the (apparently surprising) fact that the second line produces a 10-vector. Notably lacking is any explanation of what's meant to be surprising here, how that relates to the internal logic of `sample`, or how to avoid falling into the trap.

Overall, I feel like R's documentation is a bit like a conversation among experts, with a rather sink-or-swim attitude towards newcomers.

Documentation is far from the first thing that stands out about R vs Python, but it's the most salient, I think, in the context of the original article.

1 comments

> This is particularly a problem for those help pages-- and there are a lot of them-- that describe a raft of related functions all at the same time. Describing a bunch of related functions in the same place sounds like a good idea (it should help you figure out `apply` vs `sapply`, right?). Yet this is exactly when the documentation organization results in the most scattershot reading, because in addition to having to synthesize between sections, you have to mentally prune away text that, for one reason or another, doesn't apply to your particular case (for example, because different functions don't all share the same arguments, or because you want to read about the values for just one variation on the function).

This reminds me very strongly of man pages. Man pages group either similar (man 3 printf) or closely related (man 3 malloc) functions, and intersperses bits about each of the functions documented by the page, which ranges from difficult to read to mind-boggling (when you have half a dozen near-identical functions being documented at the same time). Reading an lapply documentation page[0] it looks very similar in organisation, and similarly difficult to parse/use.

> The comments alert me that there's a "surprise" in store, and they even allude to the (apparently surprising) fact that the second line produces a 10-vector. Notably lacking is any explanation of what's meant to be surprising here, how that relates to the internal logic of `sample`, or how to avoid falling into the trap.

On http://www.inside-r.org/r-doc/base/sample the surprise is explained by the first paragraph of the details, with the hell of an understatement that "this convenience feature may lead to undesired behaviour" but without the big red blinking box it would definitely deserve.

[0] https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapp...