| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vsbuffalo 3823 days ago

I really like all plotting systems in R. First, I used base graphics for a few years—and loved it. You learn your way around par(), commit esoteric argument names to memory (oma, mar, mgp, mfrow, etc). It feels powerful — you're just drawing on a screen; its history traces to the original pen plotters. Second, I learned lattice. You can't help but fall in love with lattice after a year or two with creating panel plots in base graphics. The biggest learning curve with lattice is panel functions, but once you learn to throw a browser() in a panel function for stack variable introspection, you can do anything. Somewhere on a dusty bookshelf is a well-worn lattice book I splurged on while taking an R course at UCD.

I like this article, because I think for production graphics, the author has a point. If you're placing lines, points, and labels on a screen — you can create anything. You can draw polygons and arcs. It's like drawing with raw SVG. But I'd have a hard time thinking of an exploratory data analysis situations I wouldn't reach for ggplot2 first. Since it looks at dataframe column types (integers, factors, numerics), it automatically matches these two the appropriate type of color gradient. Coloring a scatter plot by a potential confounder is one additional argument to aes(), e.g. aes(x, y, color=other_col). More than once during EDA I've done this and seen some horrifying pattern in data that shouldn't be there. That's a powerful tool for one extra function argument — the cost of checking for a confounder with color (or shape) is essentially near zero.

I'd make the case that this is a more costly operation in base graphics, and is thus much less likely to be done. You may already have your plots in a for loop to create panels, then you have a few extra lines for adjusting margins and axes (rather than facet_wrap(~col)). It took a lot of code to set that up — there's already a lot of cruft when you just need to do a quick inspection. Then you need to create a vector of appropriate size of colors, and then map this to data. Sure it's easy-ish, but it takes at least double the time as color=some_col. In EDA visualization, I want every single barrier to checking a confounder to be as small as possible—which is what ggplot2 does.

That said, I really liked this article because I do agree that going from EDA visualization to production is a hassle. Just after reading this, I remade some production ggplots with base graphics and love the simple aesthetic — which to mirror in ggplot takes a lot of hassle.

What I really long for is a lower-level data to visualization mapping (like d3) in R. d3 is a pain to learn, but it's really the only data abstraction (even though it is a low-level abstraction) that is seemingly limitless in what it does and can do. I always hope for a general data-join grammar like d3's to be the norm, built on top of base plotting (analogously: svg elements), and then have abstractions like ggplot for tabular data built on top of that.

3 comments

Lofkin 3822 days ago

What do you think of bokeh: https://github.com/DataWookie/MonthOfJulia

link

pwang 3822 days ago

> that is seemingly limitless in what it does and can do

With great power comes terrible debugging!

Have you checked out rBokeh? Full browser interactivity, support for many more points than D3 (and way more if you turn on webGL), support for both server-based and serverless interactivity, all straight from R. http://hafen.github.io/rbokeh/

link

phillc73 3822 days ago

There are quite a lot of R htmlwidgets which interface with d3.[0]

[0] http://www.htmlwidgets.org/showcase_metricsgraphics.html

link