|
|
|
|
|
by uryga
1803 days ago
|
|
i would recommend getting comfortable with doing stuff with base R, then trying tidyverse. Starting with dplyr might get you results quick, but its "special evaluation" actively confuses your understanding of how the base language actually works (speaking from experience with an R course and subsequently helping other confused folks) Consider this example: # base R
starwars[starwars$height < 200 & starwars$gender == "male", ]
# dplyr
starwars %>% filter(
height < 200,
gender == "male"
)
(Source: https://tidyeval.tidyverse.org/sec-why-how.html)Where'd `height` and `gender` come from in the dplyr version?
They're just columns in a DF, not variables, and yet they act like variables...
Well that's the dplyr magic baby! dplyr (and other tidystuff) achieves this "niceness" by doing a whole bunch of what amounts to gnarly metaprogramming[1] -- that example was taken from a whole big chapter about "Tidy evalutation", describing how it does all this quote()-ing and eval()-ing under the hood to make the "nicer" version work. it's (arguably) more pleasant to read and write, but much harder to actually understand -- "easy, but not simple", to paraphrase a slightly tired phrase. --- [1] IIRC it works something like this. the expressions height < 200
gender == "male"
are actually passed to `filter` as unevaluated ASTs (think lisp's `quote`), and then evaluated in a specially constructed environment with added variables like `height` and `gender` corresponding to your dataframe's columns. IIRC this means it can do some cool things like run on an SQL backend (similar to C#'s LINQ), but it's not somthing i'd expose a beginner to. |
|